科技回声

11 条评论

x0x0大约 11 年前

I love R, and I think that the insight people often overlook for R's success is pretty simple: the easy things are easy. Doing hard things in R can be very hard, but the easy things are easy. Eg loading a csv full of data and running a regression on it are two lines of code that are pretty easy to explain to people:<pre><code> $ R data <- read.csv(file='some_file', header=T, sep=',') model <- lm(Y ~ COL1 + COL2 + COL3, data=data) </code></pre> and if you want to use glm -- logistic regression, etc -- it's a trivial change:<pre><code> model <- glm(Y ~ COL1 + COL2 + COL3, family=binomial, data=data) </code></pre> It really allows people to do quite powerful statistical analyses very simply. Note that they built a dsl for specifying regression equations -- and you don't have to bother with bullshit like quoting column column names; quoting requirements are often hard to explain to new computer users.R's other key feature is it includes a full sql-like data manipulation language to manipulate tabular data; it's so good that every other language that does stats copied it. If df is a dataframe, I can issue predicates on the rows before the comma and columns after the comma, eg<pre><code> df[ df$col1 < 3 & df$col2 > 7, 'col4'] </code></pre> that takes my dataframe and subsets it so -- row predicates before the comma -- col1 is less than 3 and col2 is greater than 7 -- and column predicates after the comma -- just returns a new dataframe from the subset with col4 in it. It's incredibly powerful and fast.

评论 #7547829 未加载

评论 #7548934 未加载

urlwolf大约 11 年前

There are people who think that there's elegance in R's design. Remember that is as old as C (if not more!), but it still feels like a modern language (warts and all). You don't compare C to say clojure for language features, they show the advances in language design over the years.R says: 'everything is a vector, and vectors can have missing values'. This is profound. It was only recently that other matrix-oriented language extensions (say panda) got missing values, even though they are meat-and-potatoes for data analysis.

评论 #7549204 未加载

评论 #7546988 未加载

评论 #7546688 未加载

JasonCEC大约 11 年前

My startups [1] does flavor profiling and statistical quality control for beer and bourbon producers - it's a fun job!Our entire back-end is built in R, mostly within the Hadly-verse, and we use Shiny [2] as our web framework.Our team works a bit differently than most, I suspect; our data-scientists build features and analysis directly in R, and then add the functionality to our Shiny Platform. Our "real devs" are all server + DB, or Android guys. This has created a great development system where all of the "cool findings" and 'awesome visualizations' are immediately implemented in our system, and made available for our clients![1] www.Gastrograph.com [2] <a href="http://shiny.rstudio.com/" rel="nofollow">http://shiny.rstudio.com/</a>---- EDIT ---Edited to add; R is a great language and is 100% suitable for production systems. Its older than Python(!), and, with some experience, can be made in to high performance code.

评论 #7548318 未加载

评论 #7549407 未加载

Fomite大约 11 年前

While I'm not as fond of ggplot2 as the author is, and actually prefer base graphics when making things for publication, I think he hits on a lot of strong points.I'm rather fond of R as a language, and hop between it and Python as my preferred tools of choice for a given task. I think the package ecosystem is it's biggest plus - for statistical work, Python might have a package to do something, R almost certainly will.

评论 #7547335 未加载

sveme大约 11 年前

Great intro, just a minor nitpick to not spread confusion: Julia does actually have named arguments:<a href="http://docs.julialang.org/en/latest/manual/functions/#keyword-arguments" rel="nofollow">http://docs.julialang.org/en/latest/manual/functions/#keywor...</a>

minimaxir大约 11 年前

I hadn't used data.table or plyr because the native R functions were giving me good performance even at tens of thousands of rows.But now that I'm doing analysis on hundreds of thousands of rows, doing aggregation takes awhile. This article convinced me to give those packages a try. If data.table and plyr aggregate functions are indeed paralellizeable, that's a big deal, especially when implementing bootstrap resampling.

评论 #7546597 未加载

评论 #7546980 未加载

baldfat大约 11 年前

I still think that R has more users then Python with Pandas BUT the perception is that Pandas is bigger and better.I started with Pandas and learned R. I find that R is just better and if R isn't right then Julia or Closure will do the work.The tools in R are just better and more varied.

评论 #7547413 未加载

Myrmornis大约 11 年前

Random R gripe: it's hard to reuse code cleanly, because it lacks a nice import system like python, haskell, etc. Related: making a package is complicated (or was last time I looked).

bernardom大约 11 年前

Does anyone know of a good explanation of how plyr, dplyr, data.table and *apply functions differ? I'd love to read an in-depth analysis of each and make an informed decision on which one to use going forward.My current m.o. is to use data.frames as needed and plyr if I need to do any serious manipulation (which means that every time I use plyr, I need to read the docs). There's a lot of benefit to picking one direction and sticking with it...

评论 #7547417 未加载

评论 #7547409 未加载

评论 #7547352 未加载

评论 #7547350 未加载

Malarkey73大约 11 年前

I agree with most of what they say here - except using data.table. I much prefer using data.table structures with dplyr a far simpler more familiar syntax. Indeed data.table - the object implementation - is brilliant and should just replace data.frame ...if that were possible.

评论 #7553477 未加载

kachnuv_ocasek大约 11 年前

Does anyone else have issues viewing the page in Chrome 33? This is what I see: <a href="http://puu.sh/7ZAfb.png" rel="nofollow">http://puu.sh/7ZAfb.png</a>

评论 #7548279 未加载

11 条评论

x0x0大约 11 年前

评论 #7547829 未加载

评论 #7548934 未加载

urlwolf大约 11 年前

评论 #7549204 未加载

评论 #7546988 未加载

评论 #7546688 未加载

JasonCEC大约 11 年前

评论 #7548318 未加载

评论 #7549407 未加载

Fomite大约 11 年前

评论 #7547335 未加载

sveme大约 11 年前

minimaxir大约 11 年前

评论 #7546597 未加载

评论 #7546980 未加载

baldfat大约 11 年前

评论 #7547413 未加载

Myrmornis大约 11 年前

Random R gripe: it's hard to reuse code cleanly, because it lacks a nice import system like python, haskell, etc. Related: making a package is complicated (or was last time I looked).

bernardom大约 11 年前

评论 #7547417 未加载

评论 #7547409 未加载

评论 #7547352 未加载

评论 #7547350 未加载

Malarkey73大约 11 年前

评论 #7553477 未加载

kachnuv_ocasek大约 11 年前

Does anyone else have issues viewing the page in Chrome 33? This is what I see: <a href="http://puu.sh/7ZAfb.png" rel="nofollow">http://puu.sh/7ZAfb.png</a>

评论 #7548279 未加载

R: the good parts

11 条评论

R: the good parts

11 条评论