TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The SAS vs. R Debate

57 pointsby ulam2about 11 years ago

11 comments

jzwinckabout 11 years ago
The article says this particular instance of the debate started in 2011. Things have shifted a little since then, and I think Python has won more mindshare with Pandas, SciPy, NumPy, and all the rest. I&#x27;ve used both Python and R, and think the next debate will be between those two, as people find that R is not a very good programming language and lacks decent libraries for things like web scraping.<p>Python can be a single tool that integrates with every part of your workflow. R right now still wins in the number of algorithms implemented in it (there are statistical methods not available in R but not Python), and R has more terse syntax which some people like for interactive use. But for really Big Data, terse syntax and an endless variety of esoteric algorithms are not as important as, say, robust error handling and debugging (a weak area in R, but a strong one in Python).
评论 #7330225 未加载
评论 #7330231 未加载
评论 #7331021 未加载
评论 #7330049 未加载
RobinLabout 11 years ago
I use of both Python (pandas) and base SAS at work for UK government.<p>I have lots of experience in SAS, and enjoy using it. The macro language allows for very succinct solutions to difficult data manipulation problems.<p>However, given SAS&#x27;s huge expense it&#x27;s difficult for me to identify any &#x27;killer&#x27; areas where it&#x27;s significantly better than open source tools. Indeed, I find pandas faster and easier to use for many problems.<p>I find it hugely frustrating that the government pays so much money for SAS licences and training when most people use it for simple use cases, where they would be better picking up transferable skills (e.g. Python, SQL, R).<p>My understanding is that that SAS supposed to be good at processing very large datasets because it uses RAM efficiently (only the PDV is stored in RAM). But in reality, a small minority of users are processing datasets that are too big for RAM (e.g. 16gb+) and there are probably better tools for the job in this use case.<p>One user here comments that SAS is like an &#x27;improved Excel&#x27;. In fact, I find pandas much closer to Excel than SAS because (in ipython notebook at least), you get nice visual representations of your tables, and it usually isn&#x27;t difficult to translate an Excel operation into a pandas one. I especially like the multi-index and pivot table based capabilities. With a background in VBA for Excel, it&#x27;s also relatively easy to pick up Python.<p>None of this is quite so obvious in SAS, which has quite an unusual data step and macro programming language. It&#x27;s very powerful, but is quite unintuitive to begin with due to a complete reliance on the program data vector.
评论 #7331010 未加载
JasonCECabout 11 years ago
My company uses R, Shiny, and Rserve for nearly <i>everything</i>. R is a great programing language - if you need to quickly and efficiently develop stat&#x27;s based features for medium sized data.<p>R excels (get it?) at creating reproducible, fault tolerant, consistent functions that can be automated, packaged, applied to a variety of data types, and then extended later.<p>Our web-stack is Shiny on AWS and we call our API&#x27;s built in R (ML, images, data, etc) from Android using Rserve.<p>A lot of the (programing?) criticisms of R will be &#x27;solved&#x27; or become non-issues in the next few years. Multithreading, implicit vectorization, better memory handling, gpu functions, among other things are all in the pipe :) (That said, the syntax _is_ a little weird to get use to)<p>-----<p>* We&#x27;re hiring for very senior positions in data-science and more general R programers. Contact me if you&#x27;re interested (JasonCEC [at] Gastrograph.com)<p>[edited for spelling]
评论 #7331980 未加载
zmmmmmabout 11 years ago
The problem with R is that it&#x27;s just not a very good programming language. It&#x27;s great for interactive analysis, but dismal for building higher level abstractions. It&#x27;s like the PHP or MySQL of the data analysis world. Data types get magically converted all over the place, the global namespace is just a giant playground for every module to pollute, it has something like 5 different object systems all with subtle differences. All the defaults that are set for the convenience of interactive use undermine any kind of reliable use for building on as a platform (for example, the &quot;simplification&quot; concept where a 1 column data frame often magically turns into a vector).<p>I&#x27;ve forced myself to use R intensively for a couple of years now, but I must say it&#x27;s still a relief every time I bail out and get back to a &quot;real&quot; programming language.
dekhnabout 11 years ago
My favorite part about the SAS v R debate was the conclusion of this article: <a href="http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-open-source-one-letter-at-a-time/" rel="nofollow">http:&#x2F;&#x2F;bits.blogs.nytimes.com&#x2F;2009&#x2F;02&#x2F;16&#x2F;sas-warms-to-open-s...</a><p>&quot;&quot;&quot;In the article, Ms. Milley said, “I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”<p>To her credit, Ms. Milley addressed some of the critical comments head-on in a subsequent blog post.&quot;&quot;&quot;<p>(Boeing uses R heavily and when you fly on their aircraft, you&#x27;re flying on open source)
opensandwichabout 11 years ago
Since SAS is a relatively simple language, why can&#x27;t someone just write a transcompiler that supports a subset of SAS and move it to R? That way you have the best of both worlds (sort of).<p>The most difficult thing about that is how you would treat &quot;by&quot; statements (SAS) vs the split-apply-combine (R).<p>Self-plug: I sort of made a quick hack about a month ago for SAS-Python, I&#x27;m sure someone with more programming experience than me could produce something much better (I come from a maths background).<p><a href="http://nbviewer.ipython.org/gist/chappers/8747253/stan_example.ipynb" rel="nofollow">http:&#x2F;&#x2F;nbviewer.ipython.org&#x2F;gist&#x2F;chappers&#x2F;8747253&#x2F;stan_examp...</a> <a href="https://github.com/chappers/Stan" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;chappers&#x2F;Stan</a>
评论 #7331538 未加载
评论 #7331225 未加载
pistolpete20about 11 years ago
I used to work for one of the largest U.S insurance companies. They were always behind in transitioning to new technology (Excel 2003 could be found there in 2013). That being said, the entire staistical modeling team and research department made the switch to R and Python. Only a few clung to SAS but realized they would be forced to move to R as any collabration would need to be converted to R and not to SAS.<p>I believe it will be R verus Python future and SAS will not be a part of it.
评论 #7330733 未加载
mbqabout 11 years ago
Questions on StackOverflow: 49 878 R, 2 191 SAS; on Stats StackExchange: 5 524 R, 260 SAS.
stcredzeroabout 11 years ago
Where does Stata fit into all of this, and why is it never discussed on HN?
评论 #7329857 未加载
评论 #7329562 未加载
ropzabout 11 years ago
These people:<p><a href="http://www.teamwpc.co.uk/" rel="nofollow">http:&#x2F;&#x2F;www.teamwpc.co.uk&#x2F;</a><p>produce a compiler, tools etc that run the language of SAS.<p>(disclaimer: I interviewed there last year)
评论 #7330635 未加载
Fede_Vabout 11 years ago
Is this even a discussion? Anyone serious about analyzing data will use either R, Python (with Pandas&#x2F;SciPy, etc), or Julia. For truly immense data sets that require pipelines, you&#x27;ll use tools like spark, hadoop, etc - but SAS is basically a slightly improved excel.
评论 #7329111 未加载
评论 #7329093 未加载
评论 #7329051 未加载