Basic syntax and gotchas
Here, in no particular order, is a list of things that will help you get a sense of the shape of the language if you’re already familiar with other curly-brace or interpreted languages.
-
;
or newline separates commands. -
Use hash-comments (
#
to end of line). -
Variable typing is weak and dynamic; variables are not declared before use. Like PHP and Javascript, variables have function (not block) scope.
-
Whitespace is meaningless, unless it isn’t. Some parsing ambiguities are resolved by considering whitespace around operators. See and despair:
x<-y
(assignment) is parsed differently thanx < -y
(comparison)! -
Speaking of which, assignment looks stupid. I shit you not; these all have the equivalent effect of storing the value of
b
ina
:a <- b;
(the most common form in the wild)b -> a;
assign("a", b);
a = b;
There are subtle differences and some authorities prefer
<-
for assignment but I’m increasingly convinced that they are wrong;=
is always safer. R doesn’t allow expressions of the formif (a = testSomething())
but does allowif (a <- testSomething())
. To assign to globals1, use<<-
. -
Dots in identifier names are just part of the identifier. They are not scope operators. They are not operators at all. They are just a legal character to use in the names of things. They are often used where a normal human being would use underscores, since underscores were assignment operators in S, which I promise you don’t even want to think about.
-
If you squint,
$
acts kind of like the.
scope operator in C-like languages, at least for data frames and lists. If you’d writestruct.instance_variable
in C, you’d maybe writeframe$column.variable
in R. -
Sequence indexing is base-one. Accessing the zeroth element does not give an error but is never useful. More on this in the “Atomic vectors” section.
-
Be careful with
for
loops. The syntax is vaguely Pythonic:for(i in <sequence>) { do something; }
. You may be tempted to use the sequence operator,:
, which is akin torange()
in Python, to generate a list of integers to iterate over. Two cautions here. First, this is rarely the R idiom to use; as in MATLAB, vector operations are usually faster and harder to screw up than iteration. Reference the third chapter of the R inferno for advice on vectorizing. Second, if you do something likei in 1:foo
, the wrong thing will happen iffoo
ever holds the value 0.1:0
is the sequence (1, 0), since the:
operator can and will count backwards. Always check whetherfoo
is zero before you run your loop if you use:
. If you’re iterating over the indices of a sequence, always useseq_along(x)
in preference to1:length(x)
. -
Execute external code in the current workspace using
source('filename.R')
. If you have a unit of code that you want to spread over multiple files, the most proper way to do it is to build an R package, which is not as simple as you would like. -
Pull in a library with
library(foo)
orlibrary('foo')
orrequire(foo)
orrequire('foo')
.library()
is actually more stringent, and dies on an error if the library can’t be found. The return value ofrequire()
isTRUE
if loading the library was successful but failure to load the library is a warning and isn’t fatal. -
If something works on your machine but not your collaborator’s, ask them if they have any
option()
calls in their.Rprofile
file. This is a way to change the default behavior of some functions and it’s an awful idea, because it will change the behavior of anybody else’s code (including libraries) that depends on the default settings, in a way that’s really hard to debug when you forget about it. -
Otherwise, fundamentals are just C-like enough to lull you into a false sense of security.
Helping yourself
The R interpreter’s built-in help feature is the only place I can consistently find documentation on anything. Try: ?function_name
or ??search_term
.
Because even R’s name is stupid, it’s really hard to google R things in a useful way. Sorry. Welcome to R!
The Hyperpolyglot page comparing MATLAB, R, and NumPy syntax is helpful. John D. Cook’s R programming for those coming from other languages page is brief but useful. The R inferno offers complementary advice but blames the victim.
R messiah Hadley Wickham has a very useful wiki-book on advanced R development. The vocabulary appendix is an excellent list of things to learn.
Stack Overflow, #R on Freenode, #rstats on Twitter, and the R-Help mailing list are good places to ask for help. There is offputtingly dense advice on asking good questions.
-
Technically, this finds the closest parent scope that contains a variable of that name, falling back to assignment in the global scope if the name isn’t found in any of the parent environments. Consult
?"<<-"
for more. It turns out name resolution in R is complicated. ↩