TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

R vs Python for simple interactive data analysis

42 点作者 wildanimal超过 13 年前

5 条评论

Wilduck超过 13 年前
I'm fairly familiar with Python, somewhat familiar with R, and fairly familiar with Stata. The scenarios where you're doing OLS analysis can generally be broken in to two categories:<p>1) Exploration of a given data set.<p>2) Automatic collection and analysis of data.<p>In most academic uses of data (1) is the case. Here, if the data is already rectangular, and in a nice format, I would actually prefer Stata. However, as soon as any sort of manipulation is required, or graphing something more difficult than a scatterplot is required I shift to R.<p>R really shines through when you need to do complicated analysis on a fixed data set. I've found that Hadely Wickham's `reshape` and `ggplot` packages are invaluable. They <i>easily</i> produce graphics that are more informative and better looking than any other graphics package I've seen. Additionally R has packages for essentially any statistical analysis that you could want to do.<p>While R is able to pull data from a database, or other places, as soon as you have more dynamic data, you enter into case (2). This is when it might make sense to start using python. But even then I've found python is mostly useful for curating the data so that it can be used by R.<p>[1] <a href="http://had.co.nz/ggplot/" rel="nofollow">http://had.co.nz/ggplot/</a>
评论 #2934336 未加载
评论 #2935298 未加载
评论 #2939853 未加载
dlan1000超过 13 年前
The advantages and disadvantages the author cites seem more pertinent to his own idiosyncratic preferences than to more general features one might look for when doing interactive data analysis. They also seem easy to address. For example, an hour of time spent building a few quick functions would address most of his complaints about Python. I've personally used python, matlab, R and Stata in my research and view the first three as about equally capable. In my opinion Stata is less comparable to the others as it is more a wysiwyg collection of tools and functions. Matlab has good support for large data sets via memory mapping, has mex extensibility for building your own fast functions and is very good for interactive plotting, but doesn't produce publication-quality finals. Python is great for no-niggling fast idea to functioning execution and can push data into matlab is mlabraw. R has well developed stats packages and a huge user base. I disagree with the author regarding documentation for R--maybe he is right for the core, but depending on the package you may have trouble finding documentation beyond a man page. Ggplot is excellent but eccentric.
sudont超过 13 年前
I’m more interested in Python for it’s real-time capabilities. With Python running statistics, I can do more <i>user-facing</i> things with the data, whereas R can do more <i>statistical</i> things with the data. Plus, an arduino-based sensor array interfacing with R sounds shaky.<p>And since Python does web easily: <a href="http://rapache.net/" rel="nofollow">http://rapache.net/</a>
jasondavies超过 13 年前
LOESS is fairly simple to do in Python, or you can find an implementation via Google e.g. <a href="http://www.koders.com/python/fid5A91A606E15507B6823DEC7A059488A6624C4832.aspx?s=sort" rel="nofollow">http://www.koders.com/python/fid5A91A606E15507B6823DEC7A0594...</a><p>I'd be curious to see an updated comparison with LOESS added to the Python code!
评论 #2934515 未加载
guffwhitehill超过 13 年前
The non-parametric stats functions in R are better than Py