TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Comparison – R vs. Python: head to head data analysis

283 pointsby emreover 9 years ago

30 comments

mbreeseover 9 years ago
This is interesting, but not really an R vs. Python comparison. It&#x27;s an R vs. Pandas&#x2F;Numpy comparison. For basic (or even advanced) stats, R wins hands down. And it&#x27;s really hard to beat ggplot. And CRAN is much better for finding other statistical or data analysis packages.<p>But when you start having to massage the data in the language (database lookups, integrating datasets, more complicated logic), Python is the better &quot;general-purpose&quot; language. It is a pretty steep learning curve to grok the R internal data representations and how things work.<p>The better part of this comparison, in my opinion, is how to perform similar tasks in each language. It would be more beneficial to have a comparison of here is where Python&#x2F;Pandas is good, here is where R is better, and how to switch between them. Another way of saying this is figuring out when something is too hard in R and it&#x27;s time to flip to Python for a while...
评论 #10387111 未加载
评论 #10386475 未加载
评论 #10386935 未加载
评论 #10389688 未加载
评论 #10387001 未加载
评论 #10386459 未加载
bigtunacanover 9 years ago
R is certainly a unique language, but when it comes to statistics I haven&#x27;t seen anything else that compares. Often I see this R vs Python comparison being made (not that this particular article has that slant) as a come drink the Python kool-aid; it tastes better.<p>Yes; Python is a better general purpose language. It is inferior though when it comes specifically to statistical analysis. Personally I don&#x27;t even try to use R as a general purpose language. I use it for data processing, statistics, and static visualizations. If I want dynamic visualizations I process in R then typically do a hand off to JavaScript and use D3.<p>Another clear advantage of R is that it is embedded into so many other tools. Ruby, C++, Java, Postgres, SQL Server (2016); I&#x27;m sure there are others.
评论 #10387237 未加载
评论 #10387309 未加载
评论 #10386983 未加载
评论 #10387329 未加载
评论 #10386809 未加载
phillipamannover 9 years ago
R is a wonderful language if you chose to get used to it. I love it. I&#x27;ve even used R in production quality assurance to check for regressions in data (not the statistical regressions). I see countless R posts where people try to compare it to Python to find the one true language for working with data. Article after article, there clearly isn&#x27;t a winner. People like R and Python for different reasons. I think it&#x27;s actually quite intuitive to think about everything in terms of vectors with R. I like the functional aspects of R. I wish R was a bit faster but I am pretty sure the people who maintain R are working on that. You can&#x27;t beat the enormous library that R has.
评论 #10388354 未加载
dansoover 9 years ago
I spent a few weeks a few months ago learning R. It&#x27;s not a bad language, and yes, the plotting is currently second-to-none, at least based on my limited experience with matplotlib and seaborn.<p>There&#x27;s scant few articles on going from Python to R...and I think that has given me a lot of reason to hesitate. One of the big assets of R is Hadley Wickham...the amount and variety of work he has contributed is prodigious (not just ggplot2, but everything from data cleaning, web scraping, dev tools, time-handling a la moment.js, and books). But that&#x27;s not just evidence of how generous and talented Wickham is, but how relatively little dev support there is in R. If something breaks in ggplot2 -- or any of the many libraries he&#x27;s involved in, he&#x27;s often the one to respond to the ticket. He&#x27;s only one person. There are many talented developers in R but it&#x27;s not quite a deep open-source ecosystem and community yet.<p>Also word-of-warning: ggplot2 (as of 2014[1]) is in maintenance mode and Wickham is focused on ggvis, which will be a web visualization library. I don&#x27;t know if there has been much talk about non-Hadley-Wickham people taking over ggplot2 and expanding it...it seems more that people are content to follow him into ggvis, even though a static viz library is still very valuable.<p>[1] <a href="https:&#x2F;&#x2F;groups.google.com&#x2F;forum&#x2F;#!topic&#x2F;ggplot2&#x2F;SSxt8B8QLfo&#x2F;discussion" rel="nofollow">https:&#x2F;&#x2F;groups.google.com&#x2F;forum&#x2F;#!topic&#x2F;ggplot2&#x2F;SSxt8B8QLfo&#x2F;...</a>
评论 #10386580 未加载
评论 #10387443 未加载
评论 #10386762 未加载
sweezyjeezyover 9 years ago
This is just a series of incredibly generic operations on an already cleaned dataset in csv format. In reality, you probably need to retrieve and clean the dataset yourself from, say, a database, and you you may well need to do something non-standard with the data, which needs an external library with good documentation. Python is better equipped in both regards. Not to mention, if you&#x27;re building this into any sort of product rather than just exploring, R is a bad choice. Disclaimer, I learned R before Python, and won&#x27;t go back.
评论 #10387023 未加载
评论 #10387409 未加载
评论 #10386605 未加载
评论 #10387590 未加载
评论 #10386523 未加载
Mikeb85over 9 years ago
The reason I like R - it just makes data exploration and analysis too damn easy.<p>You&#x27;ve got R Studio, which is one of the best environments ever for exploring data, visualisation, and it manages all your R packages, projects, and version control effortlessly.<p>Then you&#x27;ve got the plethora of packages - if you&#x27;re any of the following fields: statistics, finance, economics, bioinformatics, and probably a few others, there&#x27;s packages that instantly make your life easier.<p>The environment is perfect for data exploration - it saves all the data in your &#x27;environment&#x27;, allows you to define multiple environments, and your project can be saved at any point, with all the global data intact.<p>If I want some extra speed, I can create C++ modules from within R Studio, compile and link them, as easily as simply creating a new R script. Fortran is a tiny bit more work, still easy enough however.<p>Want multicore or to spread tasks over a cluster? R has built in functions that do that for you. As easy as calling mcapply, parApply, or clusterApply. Heck, you can even write your function in another language, then R handles applying that over however many cores you want.<p>Want to install and manage packages, update them, create them, etc...? All can be done from R Studio&#x27;s interface.<p>Knitr can create markdown&#x2F;HTML&#x2F;pdf&#x2F;MS Word files from R markdown, or you can simply compile everything to a &#x27;notebook&#x27; style HTML page.<p>And all this is done incredibly easily, all from a single package (R Studio) which itself is easy to get and install.<p>Oh yeah, visualisation, nothing really beats R.<p>And while there are quirks to the language, for non-programmers this isn&#x27;t really an obstacle, since they aren&#x27;t already used to any particular paradigm.<p>As for Python, I&#x27;m sure it&#x27;s great (I&#x27;ve used it a little), but I really don&#x27;t see how it can compare. R&#x27;s entire environment is geared towards data analysis and exploration, towards interfacing with the compiled languages most used for HPC, and running tasks over the hardware you will most likely be using.
c3534lover 9 years ago
I like Python better as a language, but Python&#x27;s libraries take more work to understand and the APIs aren&#x27;t very unified. R is much more regular and the documentation is better. Even complicated and obscure machine learning tasks have good support in R. <i>BUT</i> the performance for R can be very, very annoying. Assignment is slow as all hell and it can often take work to figure out how to rephrase complicated functions in a way that R can figure out how to do efficiently. I think being much more functional than Python works well for data. I mean the L in LISP stands for list! Visualizations are also easier and more intuitive in R, too, IMO. Especially since half the time you can just wrap some data in &quot;plot&quot; and R will figure our which one it should use.<p>I think the conclusion of the article is correct. R is more pleasant for mathier type stuff, while Python is the better general-purpose language. If your jobs involves showing people powerpoint presentations of the mathematical analysis you&#x27;ve done,you&#x27;d probably want to use R. If, on the other hand, you&#x27;re prototyping data-driven applications, Python would probably be better.<p>That said, I really like Julia, but can&#x27;t justify really diving into it at this point. :\
评论 #10388331 未加载
评论 #10391167 未加载
evanpwover 9 years ago
If you only have time to learn one language, learn Python, because it&#x27;s better for non-statistical purposes (I don&#x27;t think that&#x27;s very controversial).<p>If you need cutting-edge or esoteric statistics, use R. If it exists, there is an R implementation, but the major Python packages really only cover the most popular techniques.<p>If neither of those apply, it&#x27;s mostly a matter of taste which one you use, and they interact pretty well with each other anyway.
评论 #10387671 未加载
评论 #10390334 未加载
评论 #10389358 未加载
acaloiarover 9 years ago
I have always considered R the best tool for both simple and complex analytics. But, it should not go unmentioned that the features responsible for R&#x27;s usability often manifest as poor performance. As a result, I have some experience rewriting the underlying C code in other languages. What one finds under the hood is not often pretty. It would be interesting to see a performance comparison between Python and R.
评论 #10386612 未加载
评论 #10386732 未加载
mojoeover 9 years ago
The one thing that sometimes gets overlooked when people decide whether to use R or Python is how robust the language and libraries are. I&#x27;ve programmed professionally in both, and R is really bad for production environments. The packages (and even language internals sometimes) break fairly often for certain use cases, and doing regression testing on R is not as easy as Python. If you&#x27;re doing one-off analyses, R is great -- for anything else I&#x27;d recommend Python&#x2F;Pandas&#x2F;Scikit.
评论 #10387140 未加载
评论 #10386927 未加载
评论 #10387458 未加载
ggrothendieckover 9 years ago
For R: (1) instead of `sapply(nba, mean, na.rm = TRUE)` use `colMeans(nba, na.rm = TRUE)`. (2) instead of `nba[, c(&quot;ast&quot;, &quot;fg&quot;, &quot;trb&quot;)]` use `nba[c(&quot;ast&quot;, &quot;fg&quot;, &quot;trb&quot;)]`, (3) instead of `sum(is.na(col)) == 0` use `!anyNA(col)`, (4) instead of `sample(1:nrow(nba), trainRowCount)` use `sample(nrow(nba), trainRowCount)` and (5) instead of tons of code use `library(XML); readHTMLTable(url, stringsAsFactors = FALSE)`
The13thDocover 9 years ago
The &quot;cheat sheet&quot; comparison between R and Python is helpful. The presentation is well done.<p>The conclusions state what we already know: Python is object oriented; R is functional.<p>The <i></i>Last Word<i></i> appropriately tells us your opinion that Python is stronger in more areas.
vegabookover 9 years ago
Python&#x27;s main problem is that it&#x27;s moving in a CS direction and not a data science direction.<p>The &quot;weekend hack&quot; that was Python, a philosophy carried into 2.x, made it a supremely pragmatic language, which the data scientists love. They want to think algorithms and maths. The language must not get in the way.<p>3.x is wanting to be serious. It wants to take on Golang. Javascript, Java. It wants to be taken seriously. Enterprise and Web. There is nothing in 3.x for data scientists other than the fig leaf of the @ operator. It&#x27;s more complicated to do simple stuff in 3.x. It&#x27;s more robust from a theoretical point of view, maybe, but it also imposes a cognitive overhead for those people whose minds are already FULL of their algo problems and just want to get from a -&gt; b as easily as possible, without CS purity or implementation elegance putting up barriers to pragmatism (I give you Unicode v Ascii, print() v print, xrange v range, 01 v 1 (the first is an error in 3.x. Why exactly?), focus on concurrency not raw parallelism, the list goes on).<p>R wants to get things done, and is <i>vectors first</i>. Vectors are what big data typically is all about (if not matrices and tensors). It&#x27;s an order of magnitude higher dimensionality in the default, canonical data structure. Applies and indexing in R, vector-wise, feels natural. Numpy makes a good effort, but must still operate in a scalar&#x2F;OO world of its host language, and inconsistencies inevitably creep in, even in Pandas.<p>As a final point, I&#x27;ll suggest that R is much closer to the vectorised future, and that even if it is tragically slow, it will train your mind in the first steps towards &quot;thinking parallel&quot;.
xname2over 9 years ago
&quot;data analysis&quot; means differently in R and Python. In R, it&#x27;s all kinds of statistical analyses. In Python, it&#x27;s basic statistical analysis plus data mining stuff. There are too many statistical analyses only exist in R.
acomjeanover 9 years ago
I work with biologists. R which seems strange to me they seem to take to. I think some of it is Rstudio the ide, which shows variables in memory on the side bar, you can click to see them. It makes everything really accessible for those that aren&#x27;t programmers. It seems to replace excel use for generating plots.<p>I&#x27;ve grown to appreciate R, especially its plotting ability (ggplot).
评论 #10386381 未加载
评论 #10389006 未加载
评论 #10386552 未加载
faliconover 9 years ago
Language comparisons are equiv. to religion comparisons...you aren&#x27;t going to find a universal answer or truth, it&#x27;s an individual&#x2F;faith sort of thing.<p>That being said - all the <i>serious</i> math&#x2F;data people I know love both R and Python...R for the heavy math, Python for the simplicity, glue, and organization.
zitterbewegungover 9 years ago
This is not just interesting for comparison but its interesting for people that know R&#x2F;Python how to go from one to the other.
评论 #10386316 未加载
fsiefkenover 9 years ago
It would be nice to compare JuliaStats and Clojure based Incanter with Python Pandas&#x2F;NumPy&#x2F;SciPy. <a href="http:&#x2F;&#x2F;juliastats.github.io&#x2F;" rel="nofollow">http:&#x2F;&#x2F;juliastats.github.io&#x2F;</a>
willpearseover 9 years ago
Very picky, but beware constantly using &quot;set.seed&quot; throughout your R scripts. Always using the same random number is not necessarily helpful for stats, and makes the R code look a lot trickier than it need be
wesmover 9 years ago
I hope you all know that the people who have invested most in actually building this software care the least about this discussion.
评论 #10389402 未加载
daveorzachover 9 years ago
In manufacturing Minitab and JMP are used for data analysis (histograms, control charts, DOE analysis, etc.) They are much easier to use and provide helpful tutorials on the actual analysis.<p>What features or workflow does R or Pandas&#x2F;Numpy offer to manufacturing that Minatab &amp; JMP can&#x27;t?
评论 #10387391 未加载
andyjgarciaover 9 years ago
The comparison is R to Python+pandas.<p>The equivalent comparison should be R+dplyr to Python+pandas.<p>Base R is quite verbose and convoluted compared to using dplyr. Likewise data analysis in Python is painful compared to using pandas.
thebelalover 9 years ago
The rvest implementation was the main thing that seemed like an R port of the python implementation rather than best use of rvest.<p>An alternate (simpler) implementation of the rvest web scraping example is at <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;jimhester&#x2F;01087e190618cc91a213" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;jimhester&#x2F;01087e190618cc91a213</a><p>It would be even simpler but basketball-reference designs it&#x27;s tables for humans rather than for easy scraping.
评论 #10388450 未加载
xixi77over 9 years ago
Really, syntax &quot;nba.head(1)&quot; is not any more &quot;object-oriented&quot; than &quot;head(nba, 1)&quot; -- it&#x27;s just syntax, and the R statement is in fact an application of R&#x27;s object system (there are several of them).<p>IMO, R&#x27;s system is actually more powerful and intuitive -- e.g. it is fairly straightforward to write a generic function dosomething(x,y) that would dispatch specific code depending on classes of both x and y.
评论 #10387315 未加载
dekhnover 9 years ago
In general, if I have to chose between two languages, one of which was designed specifically for statistics, and one that was more general, I will chose the more general one.<p>R&#x27;s value is in the implementation of its libraries but there is no technical reason a really OCD person couldn&#x27;t implement such high quality of libraries in Python.
vineet7kumarover 9 years ago
It would be nice to also have some notes about performance of both the languages for each of the tasks compared. I believe pandas would be faster due to its implementation in C. The last time I checked R was an interpreted language with its interpreter written in R.
评论 #10388229 未加载
jkyleover 9 years ago
Caret is a great package for a lot of utility functions and tuning in R. For example, the sampling example can be done using Caret&#x27;s createDataPartition which maintains the relative distributions of the target classes and is more &#x27;terse&#x27;.<p><pre><code> &gt; data(iris) &gt; library(caret) &gt; data(iris) &gt; idx &lt;- caret::createDataPartition(iris$Species, p = 0.7, list = F) &gt; summary(iris$Species) setosa versicolor virginica 50 50 50 &gt; summary(iris[idx,]$Species) setosa versicolor virginica 35 35 35</code></pre>
hoguover 9 years ago
IF you do your stuff in R, how do you move it into production? Or do you not need to
评论 #10388140 未加载
Myrmornisover 9 years ago
<p><pre><code> python &lt; world &gt; csv R &lt; csv &gt; analysis</code></pre>
k8tteover 9 years ago
i tried help my wife who use R in school, only to get quickly lost. also attended ~1 hour R course on university.<p>to me, R was a waste of time and I really dont understand why its so popular in academia. if you already have some programming knowledge, go with Python + Scipy instead<p>EDIT: R is even more useless without r studio, <a href="http:&#x2F;&#x2F;www.rstudio.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.rstudio.com&#x2F;</a>. and NO, dont go build a website in R!
评论 #10386433 未加载
评论 #10386538 未加载