Python, Machine Learning, and Language Wars. A Highly Subjective Point of View

167 pointsby Lofkinover 9 years ago

14 comments

idunningover 9 years ago

As someone who almost exclusively uses Julia for their day-to-day work (and side projects), I think most of the author's thoughts about Julia are correct. I think the language is great, and using it makes my life better. There are some packages that are actually better than any of their equivalents in other languages, in my opinion.On the other hand, I've also got a higher tolerance for things not being perfect, I can figure things out for myself (and luckily have the time do so), and I'm willing to code it up if it doesn't already exist (to a point). Naturally, that is not true for most people, and thats fine.The author isn't willing to take the risk that Julia won't "survive", which is fair. Its definitely not complete yet, but its getting there. I am confident that it will survive (and thrive) though, and continue growing the not-insubstantial community. I have a feeling the author will find their way to Julia-land eventually, in a couple of years or so.

评论 #10115643 未加载

评论 #10113822 未加载

评论 #10114243 未加载

评论 #10114237 未加载

评论 #10114550 未加载

leni536over 9 years ago

And there is nothing wrong with C++. For linear algebra I use the armadillo library and it's really a nice wrapper around LAPACK and BLAS (and fast!). For some reason scientists are somewhat afraid of C++. For some reason you "have to" prototype in an "easier" language. Sure, you can't use C++ as a calculator as opposed to interpreted languages, but I see people being stuck with their computations at the prototyping language and eventually not bringing it to a faster platform.Point being: C++ is not hard for scientific calculations.

评论 #10114363 未加载

评论 #10114466 未加载

评论 #10114604 未加载

评论 #10114993 未加载

评论 #10116716 未加载

评论 #10117141 未加载

rm999over 9 years ago

I switched from mostly using R to Python about a year ago for gluing together my data pipeline (from data source all the way to production models and frontends/visualizations). It hasn't really impacted what I'm capable of doing or my productivity, except the standard extra googling that comes in the first couple years I use any language.The main reason I went for Python is purely practical: it's a language people outside my team will respect and deal with. It makes it easier for me to collaborate in many different ways: share tools with other teams, transfer ownership of my code, get help when I need it, etc. Data science at some companies has the reputation of "hack something together and throw it over the wall for someone else to deal with". In my experience R only furthers this reputation. Which is too bad, it's really great at what it does.

评论 #10114980 未加载

评论 #10114215 未加载

评论 #10113848 未加载

评论 #10117164 未加载

评论 #10117918 未加载

misiti3780over 9 years ago

Octave/Matlab are "great" but good luck trying to integrate them into a production web application. Since you cant really do that - avoid using them unless you are fine with implementing the same algorithm twice. Matlab licenses cost money also, and the toolboxes cost additional money.R is useful because there are a lot of resources as it has been along for so long and is used by a large portion of the stats community. It also has a lot of useful libraries that have not been ported over to other languages yet (ggmap!!!). But you still still run into the same problem that you cannot integrate R into a production web application.I am pretty sure Hadoop streaming does not support R,Octave, or Matlab either

评论 #10114095 未加载

评论 #10114003 未加载

评论 #10116209 未加载

geomarkover 9 years ago

I just completed the Coursera data science track which took me from a complete R newbie to being at least somewhat proficient. Having previously used Python for a quite a bit of web programming, I disliked R at first except for its power in statistical programming. But I've since discovered a number of great R packages that make it a pleasure to use for things I would normally turn to Python for. Like I recently discovered the rvest package for webscraping.Data visualizations with R seem vastly superior, unless I am missing something with Python (highly likely). And putting up a slick statistics app is easy with shiny or RStudio Presenter. But R can't really scale to a large production app, isn't that right?So I feel I need to keep working with both Python and R.Added: That's a nice list Lofkin. Thanks. Also, in the article he says that Python syntax feels more natural, which I also felt. But then I started to use things like the magrittr and dplyr packages in R which gives you nice things like pipes and that feeling starts to ebb.

评论 #10114226 未加载

评论 #10117448 未加载

a_bonoboover 9 years ago

>I think it [Perl] is still quite common in the bioinformatics field though!?That's true - many day-to-day tasks in bioinformatics are more or less plain-text parsing [1], and Perl excels in parsing text and quickly using regular expressions. "My" generation of bioinformaticians doing data cleanup and analysis (20-30) uses Python, sometimes because plotting is nicer, the language is easier to get into, it's more commonly taught in universities, or other reasons - people older than that normally use Perl.Both BioPython and BioPerl are extremely useful.[1] Relevant quote from Robert Edgar: "Biology = strcomp()" from <a href="https://robertedgar.wordpress.com/2010/05/04/an-unemployed-gentleman-scholar/" rel="nofollow">https://robertedgar.wordpress.com/2010/05/04/an-unemployed-g...</a>

评论 #10114353 未加载

评论 #10115235 未加载

sampoover 9 years ago

Andrew Ng said in the Coursera Machine learning class that according to his experience, students implement the course homework faster in Octave/Matlab than in Python.But yes, the point of that course is to implement and play around with small numerical algorithms, whereas the linked blog is about someone who mainly calls existing machine learning libraries from Python.Ref. <a href="https://news.ycombinator.com/item?id=4485877" rel="nofollow">https://news.ycombinator.com/item?id=4485877</a>

评论 #10114118 未加载

评论 #10113868 未加载

评论 #10113993 未加载

评论 #10116582 未加载

zzleeperover 9 years ago

Quite interesting post. I feel that a lot of the numerical Pythonistas are in the same spot:They tolerate most languages, but find R's syntax a bit unnatural, Matlab lacking when trying to go beyond pure matrix stuff, and are waiting to see if Julia picks up (which it seems to be from what I can tell)

评论 #10113702 未加载

Adam_Oover 9 years ago

From the perspective of a student, most of the good online analytics/data analysis/stats courses use R, so it is hard to get away from it while learning the material. Once you get the base concepts down, switching to python shouldn't be hard. I think most people still prefer ggplot2 for visualization though. Whenever I use R I feel like a statistician, I can feel that 'cold rigor' emanating from the language. But in the end I think it is advantageous to wield both languages.Also I really see Jupyter as a new standard for communication. Your narrative and supporting code all in one place, ready for sharing.

评论 #10114382 未加载

评论 #10114948 未加载

Lofkinover 9 years ago

Personally I'm tempted to make the switch to Julia, but slow higher order functions, high churn in the core data infrastructure and no Pymc 3 are keeping me on pydata for a bit longer. I have numba to hold me over.

thanatropismover 9 years ago

One thing missing here: Matlab syntax is actually very close to modern Fortran. At least twice I've written Fortran code (for Monte Carlo simulations; different contexts) by overwriting Matlab code adding types / general verbosity / fixing the syntax of do-loops / etc.

评论 #10119803 未加载

评论 #10114229 未加载

DrNukeover 9 years ago

I love the hacking approach in the post: a tool is only a tool to do something valuable and not the goal itself. The Python ecosystem is the right tool at the right time, nowadays, because of the data science explosion and the need to interact very quickly with non-specialists.