Jesus Christ, here we go.

First, a note about notation. When you see a reference to a vector, the writers are probably referring to atomic vectors. There is another important data type called a list or generic vector, with (naturally) different semantics. Lists are also vectors, but lists are not atomic vectors.

Anyway: The atomic vector is the simplest R data type. Atomic vectors are linear vectors of a single primitive type, like an STL Vector in C++. There is no useful literal syntax for this fundamental data type. To create one of these stupid beasts, assign like:

``````a <- c(1,2,3,4)
``````

Haha, what is `c()`? It is a function, “c” means “concatenate,” and it assembles the vectors you pass into it end-to-end. “But I passed in numerical primitives,” you might think. Wrong! All naked numbers are double-width floating-point atomic vectors of length one. You’re welcome. Consequences of this include:

• `a`, above, is a double-typed atomic vector.
• `is.integer(2)` yields FALSE, because `2` is interpreted as a floating-point value. This has implications for testing equality! You can type an integer literal by suffixing `L`, as in `2400L`.
• `is.integer(as.integer(c(1,2)))` yields TRUE, because you gave it an atomic vector of integer type.

Note that `as.integer()`—not `integer()`—is used to cast vectors to an integer type. Similar functions include `as.character()` and `as.numeric()`; you can do things like `as.character(1.23)` or `as.numeric(c("1", "2"))`.

Index vectors like a[1] … a[4]. All indexing in R is base-one. Note that no error is thrown if you try to access a[0]; it always returns an atomic vector of the same type but of length zero, written like `numeric(0)`. Unaccountably, nobody’s in jail for that decision! Indexing past the end of the vector yields the special value `NA`, which is used to represent missingness in R. Assigning past the end of the vector (i.e. `a[10] <- 5`) works and extends the vector, filling with NA. To get an zero-filled vector of a particular length and type to start with, write something like `a <- integer(42)`.

Zero-length vectors like `numeric(0)` have undefined truthiness, and testing the truth value raises an error:

``````> if(numeric(0)) { print("Truth!"); } else {print("Folly.");}
Error in if (numeric(0)) { : argument is of length zero
``````

Kinds of atomic vectors

• logical (may contain TRUE, FALSE, NA)
• integer
• double (`real` is a deprecated alias)
• complex (as in complex numbers; write as `0+0i`)
• character (pronounced “string”—see next section)
• raw (for bitstreams; printed in hex by default. Logical operators magically operate bitwise on these; they operate elementwise on all other vector types.)

Integer and double atomic vectors are both numeric atomic vectors, i.e. `is.numeric(x)` is `TRUE`. Complex atomic vectors, duh???, are not numeric.

If you ask for a `numeric` vector using `numeric(42)` or `as.numeric(x)`, you will get a `double` vector. A perfect R-ism is that if you ask for a `single` vector, you’ll still get a double-precision float vector, though it will have a flag set so that it will be passed into C APIs as single-width `float`s instead of `double`s. There is no single-precision storage type in R.

Check the type of your vector with `typeof(x)`, which returns a string.

A potential source of mischief is that if you try to place a value of a particular type into an atomic vector of a different type, R will—silently, natch—recast either the value you are trying to add or the entire vector (!) to the more permissive type. Witness:

``````> a <- c(1L, 2L, 3L); typeof(a)
[1] "integer"
> a[1] = 2; typeof(a)
[1] "double"
> a[1] = '2'; typeof(a)
[1] "character"
``````

Logic values

`TRUE`, `FALSE`, and `NA` are special logic values. `NULL` is also defined and is a special vector of length zero. Do not use T and F for `TRUE` and `FALSE`. You will see people doing it but they’re not your friend; T and F are just variables with default values. Set `T <- F` and source their code and laugh as it burns.

This also means that you shouldn’t ever assign useful quantities to variables named T and F. Sorry. Other variable names that you cannot use are c, q, t (!), C, D, and I. :(

NA means “not available” and is a filler quantity for missing values. The result of all comparisons with `NA` is `NA`. Use `is.na(x)` to test whether a value is NA, not `x == NA`. `NA` has undefined truth value, and testing it raises an error:

``````> if(NA) print ("Hello");
Error in if (NA) print("Hello") : missing value where TRUE/FALSE needed
``````

NULL, by the way, also has undefined truth value, raising an error if you test it:

``````> if(NULL) print("Nope");
Error in if (NULL) print("Nope") : argument is of length zero
``````

If you need to test the truth value of some `x` that may sometimes be `NA` or have zero length, you can test the charming and ever-so-concise expression `identical(TRUE, as.logical(x))`, which will always evaluate to true or false.

Dealing with strings

When you see “character atomic vector” you should think “string atomic vector.” `length('foo bar')` yields 1… because you have created a character atomic vector of length one, containing the character value ‘foo bar’. (Yes. I know.) `length(c('to be', 'or not', 'to be'))` is 3.

String primitives, which is to say the elements of a character atomic vector, are immutable.

Some other things that are true:

• `length('foo')` is 1 (see above).
• `nchar('foo')` is 3.
• Strings are indexed with `substr(x, start, stop)`. Base one, remember: `substr('foo', 1, 1)` is ‘f’. `substr('foo', 2, 3)` is ‘oo’.
• You can wrap strings in either single or double quotes. Escape with backslashes as in C, e.g. `'Tim\'s bad attitude.'`
• `paste()` is useful for a variety of string-concatenation operations. There is also a `sprintf()` function.

The stringr or Biostrings packages may ease the pain of string handling in R. In particular, `stringr` has a very pleasant interface for matching regexps.

Vector operations

You can do vector math in R, which always operates elementwise, like the dot operators in MATLAB. R will not do linear algebra unless you explicitly ask it to (with the infix operator `%*%`; see `?"%*%"`). Vector math is fast and dangerous. Almost nothing you can do with vector math will raise an error. If your operands are different sizes, R will silently recycle your short vector until it’s long enough to perform the operation.1 R will, at least, raise a warning if your short vector does not fit into your long vector an integer number of times; fear it.

Arrays

Atomic vectors are extended to multiple dimensions as arrays. A matrix is a two-dimensional array. One-dimensional arrays are possible; the primary difference between a one-dimensional array and a vector is that `dim(some.array)` will have length 1 and `dim(some.vector)` will be `NULL`. N-dimensional arrays are indexed like `my.array[dim1, dim2, dim3]`. Use an empty value to represent “all values”—i.e., to select row 3 of a matrix, use `my.matrix[3,]`.

1. which is fucking ghastly