R is a Language: Treat it Like One

I'm helping out with teaching a class on an introduction to R for students this week. I figured it'd be a good time to discuss my thoughts on programming in R and how a newcomer should feel about learning the language.

Those Who Teach R, Should Use R

Many of the students in the class at the beginning are overwhelmed. They see a bunch of different symbols and syntax; being overwhelmed is understandable. Moreover, those teaching you R can seem lightning fast when running code, moving around screens, or figuring out problems, and they usually are. Do you know why?

Those who are teaching you R, use R. They use R a lot. They use it daily, and for hours a day usually. Don't try to be them immediately, get the basics.

I hear students saying “Oh, you're fast” when power users help them, usually lined with an undertone of low self-esteem. That's like me going to a basketball camp run by Michael Jordan and saying “Oh man Michael, you're really good on the court”. Be like Mike: work hard to learn the basics like the back of your hand. You'll be dunking in no time.

And, of course, those who teach you are fast. Why would you take class from someone slow or unsure while doing what they're teaching you? Yes, your teachers may be fast, but that's the point. Learn from those who do.

I feel as though I'm relatively fast on my machine and I felt this starting out. Even more, I didn't know what tab completion was when starting out my first class. I thought my professor could actually type that fast when writing variables or directory paths. I thought it was magic. If I didn't stick it out, I wouldn't figure out how to make that magic myself.

R is a language

I took Spanish in high school for 4 years. I remember some vocabulary words and some conjugation rules, but am far from conversational. I've never taken French – I don't know French. Now, if I took an introduction to French class for 4 hours, do you think I could speak (or write) French fluently? No, of course not. Yet students think they can with R. My Spanish is like a background in Stata: some words/phrases/commands are similar, other are misleading and can be confusing.

R is a programming language. Just like a foreign language, R has syntax and grammar. You must learn simple punctuation such as placement of commas, assigment using “=” or “<-”, where to close parentheses, and when to notice when a quote is unmatched. My overall message is:

R is a language, treat it like one.

Remember to tell students to hold themselves to the same level of comprehension as they would for a spoken (or signed) or written language. Hopefully, that will put learning R in perspective, even if it does not make it any less overwhelming. I wonder if Rosetta Stone will make an R module one of these days.

4 thoughts on “R is a Language: Treat it Like One

  1. — R is a programming language.

    Well, no it’s not. Not for 99.44% of its users. For them it’s a stat pack command set with a needlessly complex syntax, unlike SAS or SPSS which treat the two uses separately. By insisting that R is sorta kinda like java or FORTRAN or C first/foremost/principally, one both intimidates and turns off users. Yes, those such as Hadley, use R to write packages which are useful to others. But even that is changing. More often these days, the real code is in C/C++ and wrapped in as minimal as possible R function. Even with multi-processor/core/SSD/10s of gigabyte machines, R is blamed for being too slow, so more code is being written closer to the metal. And that is as it should be. R syntax for the commands, C/C++ for the real code.

    The vast majority simply want to inhale some data, run an existing procedure on said data, and generate some pretty pictures with ggplot2/ggvis. Empowering them to do that with as little fuss as possible serves them far better than making them pretend to be a C coder.

    Teaching R, to newbies, as a command syntax and de-emphasizing (or, even, ignoring) the coding semantics has been more successful. Muenchen’s book would be a good basis for such a class.

    The S folk, followed by the R folk, decided that one syntax for both commands and procedures/functions was the way to go. I’m increasingly sure that such was not a good decision. The rise of Rcpp confirms that others have, tacitly, reached the same conclusion.

    • You seem to be making a case that R is not a programming language because it is slow. That is ridiculous. Increased use of other (faster) languages such as python and the Cs is a direct result of larger, more complex data which R was not designed for. R is a statistical language used to make statistical software. That’s it. Of course it is not a general purpose language for creating software, but nonetheless, it is no less a language than C because you don’t like the syntax or it’s too slow.

      • No. I’m stating the fact that the vast majority of R users need not, and should not be forced to, treat the use of R as a programming task. For these folks, not just newbies by the way, the use of R should first be approached as a better version of SPSS or SAS. They need only to execute stat commands (which, between CRAN and Bioconductor, number about 6,000 so far), just as they do in other packages. Thus my reference to Muenchen’s book. (By coincidence, there’s a new posting today on R-bloggers for a package called translateR, which purports to transliterate from SPSS to R. Might even work.) Some, a very small fraction, will go further into writing stat software in the R ecosystem. That’s a different task and requires a different use of a different subset of the syntax. Or, if the only tool you have is a hammer, everything tends to look like a nail. The “R is a programming language and all uses are programming tasks” folks take that view. My own is that R is the analog to the full boat Swiss Army Knife. Use the parts you need, ignore the rest.

        That different task is implemented in most other packages with some separate syntax (and semantics, of course). The originators (S) felt that having one syntax for both tasks was, for them, better. They still called Fortran for most of the heavy lifting, even in the beginning. They built S with S, and for themselves. At the time, 4GLs were au courant, and one of their hallmarks was to be able to “bootstrap” the application in its own language. S took the same path.

        These days those that build serious packages are dropping to low level language with more frequency. Whether the baby will be cleaved cleaning in half (commands run in R, but written in C/C++/etc.), Solomon style, is unknown. But the recognition that vastly different tasks are better served with separate languages is dawning on the R community. Again, Rcpp came to be for a reason.

        Cute handle, by the way. Mine’s real.

  2. I agree with what you’re saying for the most part. We are trying to show people how to read in data, do regressions and statistical tests, and try to get the tools to make a report. I think there are things that are programmatic, such as what a list is or what different data types are, that are necessary and not really dealt with in the other softwares SAS/Stata/SPSS. What I’m just trying to say is that people in any of these 3 languages sometimes get easily frustrated from the beginning that they don’t “get it”. I’ve seen the same apply to Stata, but they have dropdown commands to help them out. Now Rcmdr and other packages of the like have options to help with this, and are great, but are not necessarily to the same level as those for Stata. For these tasks, I think you are right.

    I’m saying if they want to do data manipulation in R, it’s a bit more “programming”. Let’s say I want the first record per person with repeated measures?
    ddply(dataset, .(id), function(x) {
    is not really intuitive vs. Stata:
    by id: keep if _n == 1
    (which isn’t that intuitive until you learn what _n is). My concern is that students see this code, say “I don’t get it” day 1 and feel discouraged. I want students to know that it is a LANGUAGE in some capacity.

    With your analogy it’s like getting the terms you need to get by in a foreign city “Where’s the bathroom”, “What time is it”, “How do you get here?” etc. That’s all fine and good but if you try to do things the fluent speakers do, it takes more time. That’s the message I want to convey.
    Students need to understand that syntax and grammar are almost always not intuitive without a frame of reference but take time and should not be discouraged if it takes time. It took a lot of time for me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s