Atomic vectors
Jesus Christ, here we go.
First, a note about notation. When you see a reference to a vector, the writers are probably referring to atomic vectors. There is another important data type called a list or generic vector, with (naturally) different semantics. Lists are also vectors, but lists are not atomic vectors.
Anyway: The atomic vector is the simplest R data type. Atomic vectors are linear vectors of a single primitive type, like an STL Vector in C++. There is no useful literal syntax for this fundamental data type. To create one of these stupid beasts, assign like:
a < c(1,2,3,4)
Haha, what is c()
? It is a function, “c” means “concatenate,” and it assembles the vectors you pass into it endtoend. “But I passed in numerical primitives,” you might think. Wrong! All naked numbers are doublewidth floatingpoint atomic vectors of length one. You’re welcome. Consequences of this include:
a
, above, is a doubletyped atomic vector.is.integer(2)
yields FALSE, because2
is interpreted as a floatingpoint value. This has implications for testing equality! You can type an integer literal by suffixingL
, as in2400L
.is.integer(as.integer(c(1,2)))
yields TRUE, because you gave it an atomic vector of integer type.
Note that as.integer()
—not integer()
—is used to cast vectors to an integer type. Similar functions include as.character()
and as.numeric()
; you can do things like as.character(1.23)
or as.numeric(c("1", "2"))
.
Index vectors like a[1] … a[4]. All indexing in R is baseone. Note that no error is thrown if you try to access a[0]; it always returns an atomic vector of the same type but of length zero, written like numeric(0)
. Unaccountably, nobody’s in jail for that decision! Indexing past the end of the vector yields the special value NA
, which is used to represent missingness in R. Assigning past the end of the vector (i.e. a[10] < 5
) works and extends the vector, filling with NA. To get an zerofilled vector of a particular length and type to start with, write something like a < integer(42)
.
Zerolength vectors like numeric(0)
have undefined truthiness, and testing the truth value raises an error:
> if(numeric(0)) { print("Truth!"); } else {print("Folly.");}
Error in if (numeric(0)) { : argument is of length zero
Kinds of atomic vectors
 logical (may contain TRUE, FALSE, NA)
 integer
 double (
real
is a deprecated alias)  complex (as in complex numbers; write as
0+0i
)  character (pronounced “string”—see next section)
 raw (for bitstreams; printed in hex by default. Logical operators magically operate bitwise on these; they operate elementwise on all other vector types.)
Integer and double atomic vectors are both numeric atomic vectors, i.e. is.numeric(x)
is TRUE
. Complex atomic vectors, duh???, are not numeric.
If you ask for a numeric
vector using numeric(42)
or as.numeric(x)
, you will get a double
vector. A perfect Rism is that if you ask for a single
vector, you’ll still get a doubleprecision float vector, though it will have a flag set so that it will be passed into C APIs as singlewidth float
s instead of double
s. There is no singleprecision storage type in R.
Check the type of your vector with typeof(x)
, which returns a string.
A potential source of mischief is that if you try to place a value of a particular type into an atomic vector of a different type, R will—silently, natch—recast either the value you are trying to add or the entire vector (!) to the more permissive type. Witness:
> a < c(1L, 2L, 3L); typeof(a)
[1] "integer"
> a[1] = 2; typeof(a)
[1] "double"
> a[1] = '2'; typeof(a)
[1] "character"
Logic values
TRUE
, FALSE
, and NA
are special logic values. NULL
is also defined and is a special vector of length zero. Do not use T and F for TRUE
and FALSE
. You will see people doing it but they’re not your friend; T and F are just variables with default values. Set T < F
and source their code and laugh as it burns.
This also means that you shouldn’t ever assign useful quantities to variables named T and F. Sorry. Other variable names that you cannot use are c, q, t (!), C, D, and I. :(
NA means “not available” and is a filler quantity for missing values. The result of all comparisons with NA
is NA
. Use is.na(x)
to test whether a value is NA, not x == NA
. NA
has undefined truth value, and testing it raises an error:
> if(NA) print ("Hello");
Error in if (NA) print("Hello") : missing value where TRUE/FALSE needed
NULL, by the way, also has undefined truth value, raising an error if you test it:
> if(NULL) print("Nope");
Error in if (NULL) print("Nope") : argument is of length zero
If you need to test the truth value of some x
that may sometimes be NA
or have zero length, you can test the charming and eversoconcise expression identical(TRUE, as.logical(x))
, which will always evaluate to true or false.
Dealing with strings
When you see “character atomic vector” you should think “string atomic vector.” length('foo bar')
yields 1… because you have created a character atomic vector of length one, containing the character value ‘foo bar’. (Yes. I know.) length(c('to be', 'or not', 'to be'))
is 3.
String primitives, which is to say the elements of a character atomic vector, are immutable.
Some other things that are true:
length('foo')
is 1 (see above).nchar('foo')
is 3. Strings are indexed with
substr(x, start, stop)
. Base one, remember:substr('foo', 1, 1)
is ‘f’.substr('foo', 2, 3)
is ‘oo’.  You can wrap strings in either single or double quotes. Escape with backslashes as in C, e.g.
'Tim\'s bad attitude.'
paste()
is useful for a variety of stringconcatenation operations. There is also asprintf()
function.
The stringr or Biostrings packages may ease the pain of string handling in R. In particular, stringr
has a very pleasant interface for matching regexps.
Vector operations
You can do vector math in R, which always operates elementwise, like the dot operators in MATLAB. R will not do linear algebra unless you explicitly ask it to (with the infix operator %*%
; see ?"%*%"
). Vector math is fast and dangerous. Almost nothing you can do with vector math will raise an error. If your operands are different sizes, R will silently recycle your short vector until it’s long enough to perform the operation.^{1} R will, at least, raise a warning if your short vector does not fit into your long vector an integer number of times; fear it.
Arrays
Atomic vectors are extended to multiple dimensions as arrays. A matrix is a twodimensional array. Onedimensional arrays are possible; the primary difference between a onedimensional array and a vector is that dim(some.array)
will have length 1 and dim(some.vector)
will be NULL
. Ndimensional arrays are indexed like my.array[dim1, dim2, dim3]
. Use an empty value to represent “all values”—i.e., to select row 3 of a matrix, use my.matrix[3,]
.

which is fucking ghastly ↩