I love R, and I think that the insight people often overlook for R's success is pretty simple: the easy things are easy. Doing hard things in R can be very hard, but the easy things are easy. Eg loading a csv full of data and running a regression on it are two lines of code that are pretty easy to explain to people:<p><pre><code> $ R
data <- read.csv(file='some_file', header=T, sep=',')
model <- lm(Y ~ COL1 + COL2 + COL3, data=data)
</code></pre>
and if you want to use glm -- logistic regression, etc -- it's a trivial change:<p><pre><code> model <- glm(Y ~ COL1 + COL2 + COL3, family=binomial, data=data)
</code></pre>
It really allows people to do quite powerful statistical analyses very simply. Note that they built a dsl for specifying regression equations -- and you don't have to bother with bullshit like quoting column column names; quoting requirements are often hard to explain to new computer users.<p>R's other key feature is it includes a full sql-like data manipulation language to manipulate tabular data; it's so good that every other language that does stats copied it. If df is a dataframe, I can issue predicates on the rows before the comma and columns after the comma, eg<p><pre><code> df[ df$col1 < 3 & df$col2 > 7, 'col4']
</code></pre>
that takes my dataframe and subsets it so -- row predicates before the comma -- col1 is less than 3 and col2 is greater than 7 -- and column predicates after the comma -- just returns a new dataframe from the subset with col4 in it. It's incredibly powerful and fast.
There are people who think that there's elegance in R's design. Remember that is as old as C (if not more!), but it still feels like a modern language (warts and all). You don't compare C to say clojure for language features, they show the advances in language design over the years.<p>R says: 'everything is a vector, and vectors can have missing values'. This is profound. It was only recently that other matrix-oriented language extensions (say panda) got missing values, even though they are meat-and-potatoes for data analysis.
My startups [1] does flavor profiling and statistical quality control for beer and bourbon producers - it's a fun job!<p>Our entire back-end is built in R, mostly within the Hadly-verse, and we use Shiny [2] as our web framework.<p>Our team works a bit differently than most, I suspect; our data-scientists build features and analysis directly in R, and then add the functionality to our Shiny Platform. Our "real devs" are all server + DB, or Android guys. This has created a great development system where all of the "cool findings" and 'awesome visualizations' are immediately implemented in our system, and made available for our clients!<p>[1] www.Gastrograph.com
[2] <a href="http://shiny.rstudio.com/" rel="nofollow">http://shiny.rstudio.com/</a><p>---- EDIT ---<p>Edited to add; R is a great language and is 100% suitable for production systems. Its older than Python(!), and, with some experience, can be made in to high performance code.
While I'm not as fond of ggplot2 as the author is, and actually prefer base graphics when making things for publication, I think he hits on a lot of strong points.<p>I'm rather fond of R as a language, and hop between it and Python as my preferred tools of choice for a given task. I think the package ecosystem is it's biggest plus - for statistical work, Python <i>might</i> have a package to do something, R almost certainly will.
Great intro, just a minor nitpick to not spread confusion: Julia does actually have named arguments:<p><a href="http://docs.julialang.org/en/latest/manual/functions/#keyword-arguments" rel="nofollow">http://docs.julialang.org/en/latest/manual/functions/#keywor...</a>
I hadn't used data.table or plyr because the native R functions were giving me good performance even at tens of thousands of rows.<p>But now that I'm doing analysis on hundreds of thousands of rows, doing aggregation takes awhile. This article convinced me to give those packages a try. If data.table and plyr aggregate functions are indeed paralellizeable, that's a big deal, especially when implementing bootstrap resampling.
I still think that R has more users then Python with Pandas BUT the perception is that Pandas is bigger and better.<p>I started with Pandas and learned R. I find that R is just better and if R isn't right then Julia or Closure will do the work.<p>The tools in R are just better and more varied.
Random R gripe: it's hard to reuse code cleanly, because it lacks a nice import system like python, haskell, etc. Related: making a package is complicated (or was last time I looked).
Does anyone know of a good explanation of how plyr, dplyr, data.table and *apply functions differ? I'd love to read an in-depth analysis of each and make an informed decision on which one to use going forward.<p>My current m.o. is to use data.frames as needed and plyr if I need to do any serious manipulation (which means that every time I use plyr, I need to read the docs). There's a lot of benefit to picking one direction and sticking with it...
I agree with most of what they say here - except using data.table. I much prefer using data.table structures with dplyr a far simpler more familiar syntax.
Indeed data.table - the object implementation - is brilliant and should just replace data.frame ...if that were possible.
Does anyone else have issues viewing the page in Chrome 33? This is what I see: <a href="http://puu.sh/7ZAfb.png" rel="nofollow">http://puu.sh/7ZAfb.png</a>