TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

R: the good parts

157 pointsby urlwolfabout 11 years ago

11 comments

x0x0about 11 years ago
I love R, and I think that the insight people often overlook for R&#x27;s success is pretty simple: the easy things are easy. Doing hard things in R can be very hard, but the easy things are easy. Eg loading a csv full of data and running a regression on it are two lines of code that are pretty easy to explain to people:<p><pre><code> $ R data &lt;- read.csv(file=&#x27;some_file&#x27;, header=T, sep=&#x27;,&#x27;) model &lt;- lm(Y ~ COL1 + COL2 + COL3, data=data) </code></pre> and if you want to use glm -- logistic regression, etc -- it&#x27;s a trivial change:<p><pre><code> model &lt;- glm(Y ~ COL1 + COL2 + COL3, family=binomial, data=data) </code></pre> It really allows people to do quite powerful statistical analyses very simply. Note that they built a dsl for specifying regression equations -- and you don&#x27;t have to bother with bullshit like quoting column column names; quoting requirements are often hard to explain to new computer users.<p>R&#x27;s other key feature is it includes a full sql-like data manipulation language to manipulate tabular data; it&#x27;s so good that every other language that does stats copied it. If df is a dataframe, I can issue predicates on the rows before the comma and columns after the comma, eg<p><pre><code> df[ df$col1 &lt; 3 &amp; df$col2 &gt; 7, &#x27;col4&#x27;] </code></pre> that takes my dataframe and subsets it so -- row predicates before the comma -- col1 is less than 3 and col2 is greater than 7 -- and column predicates after the comma -- just returns a new dataframe from the subset with col4 in it. It&#x27;s incredibly powerful and fast.
评论 #7547829 未加载
评论 #7548934 未加载
urlwolfabout 11 years ago
There are people who think that there&#x27;s elegance in R&#x27;s design. Remember that is as old as C (if not more!), but it still feels like a modern language (warts and all). You don&#x27;t compare C to say clojure for language features, they show the advances in language design over the years.<p>R says: &#x27;everything is a vector, and vectors can have missing values&#x27;. This is profound. It was only recently that other matrix-oriented language extensions (say panda) got missing values, even though they are meat-and-potatoes for data analysis.
评论 #7549204 未加载
评论 #7546988 未加载
评论 #7546688 未加载
JasonCECabout 11 years ago
My startups [1] does flavor profiling and statistical quality control for beer and bourbon producers - it&#x27;s a fun job!<p>Our entire back-end is built in R, mostly within the Hadly-verse, and we use Shiny [2] as our web framework.<p>Our team works a bit differently than most, I suspect; our data-scientists build features and analysis directly in R, and then add the functionality to our Shiny Platform. Our &quot;real devs&quot; are all server + DB, or Android guys. This has created a great development system where all of the &quot;cool findings&quot; and &#x27;awesome visualizations&#x27; are immediately implemented in our system, and made available for our clients!<p>[1] www.Gastrograph.com [2] <a href="http://shiny.rstudio.com/" rel="nofollow">http:&#x2F;&#x2F;shiny.rstudio.com&#x2F;</a><p>---- EDIT ---<p>Edited to add; R is a great language and is 100% suitable for production systems. Its older than Python(!), and, with some experience, can be made in to high performance code.
评论 #7548318 未加载
评论 #7549407 未加载
Fomiteabout 11 years ago
While I&#x27;m not as fond of ggplot2 as the author is, and actually prefer base graphics when making things for publication, I think he hits on a lot of strong points.<p>I&#x27;m rather fond of R as a language, and hop between it and Python as my preferred tools of choice for a given task. I think the package ecosystem is it&#x27;s biggest plus - for statistical work, Python <i>might</i> have a package to do something, R almost certainly will.
评论 #7547335 未加载
svemeabout 11 years ago
Great intro, just a minor nitpick to not spread confusion: Julia does actually have named arguments:<p><a href="http://docs.julialang.org/en/latest/manual/functions/#keyword-arguments" rel="nofollow">http:&#x2F;&#x2F;docs.julialang.org&#x2F;en&#x2F;latest&#x2F;manual&#x2F;functions&#x2F;#keywor...</a>
minimaxirabout 11 years ago
I hadn&#x27;t used data.table or plyr because the native R functions were giving me good performance even at tens of thousands of rows.<p>But now that I&#x27;m doing analysis on hundreds of thousands of rows, doing aggregation takes awhile. This article convinced me to give those packages a try. If data.table and plyr aggregate functions are indeed paralellizeable, that&#x27;s a big deal, especially when implementing bootstrap resampling.
评论 #7546597 未加载
评论 #7546980 未加载
baldfatabout 11 years ago
I still think that R has more users then Python with Pandas BUT the perception is that Pandas is bigger and better.<p>I started with Pandas and learned R. I find that R is just better and if R isn&#x27;t right then Julia or Closure will do the work.<p>The tools in R are just better and more varied.
评论 #7547413 未加载
Myrmornisabout 11 years ago
Random R gripe: it&#x27;s hard to reuse code cleanly, because it lacks a nice import system like python, haskell, etc. Related: making a package is complicated (or was last time I looked).
bernardomabout 11 years ago
Does anyone know of a good explanation of how plyr, dplyr, data.table and *apply functions differ? I&#x27;d love to read an in-depth analysis of each and make an informed decision on which one to use going forward.<p>My current m.o. is to use data.frames as needed and plyr if I need to do any serious manipulation (which means that every time I use plyr, I need to read the docs). There&#x27;s a lot of benefit to picking one direction and sticking with it...
评论 #7547417 未加载
评论 #7547409 未加载
评论 #7547352 未加载
评论 #7547350 未加载
Malarkey73about 11 years ago
I agree with most of what they say here - except using data.table. I much prefer using data.table structures with dplyr a far simpler more familiar syntax. Indeed data.table - the object implementation - is brilliant and should just replace data.frame ...if that were possible.
评论 #7553477 未加载
kachnuv_ocasekabout 11 years ago
Does anyone else have issues viewing the page in Chrome 33? This is what I see: <a href="http://puu.sh/7ZAfb.png" rel="nofollow">http:&#x2F;&#x2F;puu.sh&#x2F;7ZAfb.png</a>
评论 #7548279 未加载