Rise of the Scientific Programmer

251 pointsby StylifyYourBlogover 10 years ago

22 comments

Fede_Vover 10 years ago

As other people have mentioned - the problem is that academia rewards producing papers, not stable software libraries.For example - scikit-learn is an amazing project, lead mostly by a small group at INRIA with Gael at the helm - and in terms of academic prestige, scikit-learn is probably 'worth less' on your CV than a couple of Nature papers.This is of course ridiculous - scikit learn is used by a huge amount of people, it takes an insane amount of work to run the project, yet the incentives are what they are.

评论 #8824523 未加载

评论 #8824715 未加载

评论 #8824171 未加载

评论 #8825155 未加载

dansoover 10 years ago

The other night I was searching for Python science books and stumbled across this one titled "Python for Biologists: A complete programming course for beginners"...the homepage for the book is here: <a href="http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/" rel="nofollow">http://pythonforbiologists.com/index.php/introduction-to-pyt...</a>Admittedly, it only has two reviews on Amazon, but they're both five stars, and they both seem to come from biologists who are apparently thrilled at being able to leverage code for their work...the funny thing is, the book itself is not "advanced" as far as what most professional programmers would consider "advanced"...the Beginners' book ends with "Files, programs, and user input" and the Advanced book ends with Comprehensions and Exceptions...I think we as programmers vastly understimate how useful even basic programming would be to virtually anyone today. I work at Stanford and it continually astounds me when I run into non-programmers who are otherwise doing data-intensive research, who fail to see how their incredibly repetitive task could be digitized and implemented as a for-loop. It's not that they are dumb, it's that they've never been exposed to the opportunity. And conversely, it's not because I'm smart, but I literally can't remember what it was like not to break things down into computable patterns. And I've been the better for it (especially because I'm generally able to recognize when things aren't easy patterns)Sometime ago, I believe it was Stephen Hawking who speculated that the realm of human knowledge was becoming so vast that genetic engineering of intelligence might be required to continue our progression...that may be so, but I wonder if we could achieve the same growth in capacity of intellect by teaching more computational thinking (and implementation), as we do with general literacy and math. As Garry Kasparov said, "We might not be able to change our hardware, but we can definitely upgrade our software."<a href="http://www.nybooks.com/articles/archives/2010/feb/11/the-chess-master-and-the-computer/" rel="nofollow">http://www.nybooks.com/articles/archives/2010/feb/11/the-che...</a>

评论 #8824058 未加载

评论 #8824155 未加载

评论 #8826089 未加载

评论 #8824976 未加载

评论 #8824169 未加载

评论 #8825015 未加载

elliotecover 10 years ago

I have noticed a general trend upwards in the interest of scientific programming for a few months now, and the community (most specifically Hacker News) has driven my interest in that area as well. The idea of functional programming and thinking in mathematically sound ways really appeals to me, but my lack of math and comp sci background is holding me back from going full-speed learning and getting better at it.I feel many of us are lost swimming in a sea of opinions and juggling frameworks du jour, development methods, and business strategies, that it keeps us from focusing on improving our skills in areas that matter. This frustrates me and I've been looking for ways to get out of it. There is also this fear of another bubble mixed with trying to keep up with the trends and hipness of the industry, to remain gainfully employed.I realize I am sort of just reiterating the authors point, so I guess what I'm saying is I agree.

评论 #8824947 未加载

评论 #8824955 未加载

patkaiover 10 years ago

Great perspective on the future, we would need many more similar discussions. I love the main message of the post, i.e. "we need to grow up". I have the impression that while we are all optimistic that the future is owned by software developers we don't realise that not for all. There will certainly be more segmentation in our profession and there will be great demand for high-end developers. This requires a lot of learning, and I personally feel it's a tough challenge.The post also made me realise how much we still think in terms of disciplines. E.g. we think a developer should learn more mathematics. If we were thinking in terms of problem solving, or "modelling reality" (at least in part with software) we couldn't separate these so easily. E.g. if you are writing a software for vehicle condition monitoring you use a combination of engineering, physics, mathematics, computer science - the less you try to - or need to - separate them the better you do.I can't quite put it simple, but in my mind I can see the future "developer", how got a BSc in Physics, went on to work as a software developer for a couple of years and then continued to learn every day maths, physics, biochemistry, worked in various projects where she could use all these. She is neither a physicist, nor a software developer or mathematician.

评论 #8827885 未加载

walshemjover 10 years ago

I started on the science/engineering side at BHRA (on campus at CIT) only problem with technical/scientific vs commercial is the pay is so poor.

评论 #8823919 未加载

评论 #8823824 未加载

juretriglavover 10 years ago

Shameless plug: In January I'll be focusing on a project which deals specifically with scientific software: <a href="http://sciencetoolbox.org/" rel="nofollow">http://sciencetoolbox.org/</a> This current version is a product of a hackathon, but this month will be improving it and adding functionality which brings the scientific software developer and her efforts into focus. Scientific software is gaining importance, but the recognition its developers get is trailing behind - I want to raise the level of associated recognition/prestige (among other related things). Some other projects rely on data collected here, e.g. a recommendation engine for said software that enhances GitHub: <a href="http://juretriglav.si/discovery-of-scientific-software/" rel="nofollow">http://juretriglav.si/discovery-of-scientific-software/</a>Shameless plug continues: if you'd like to keep track of what I'm doing I suggest you either follow the project on GitHub (<a href="https://github.com/ScienceToolbox/sciencetoolbox" rel="nofollow">https://github.com/ScienceToolbox/sciencetoolbox</a>) or Twitter (<a href="https://twitter.com/sciencetoolbox" rel="nofollow">https://twitter.com/sciencetoolbox</a>).

评论 #8824442 未加载

评论 #8824553 未加载

roflmyeggoover 10 years ago

Computational and biological sciences will likely meet on a financial equivalent to commercial software applications at the intersection of epigenetics and pharmaceuticals in the new few decades.When scientists begin to discover feasible methods to cure or manage previously incurable diseases (a more recent example of this has been attempts to cure Cystic Fibrosis), or more specifically reversing some of the diseases that our older baby boomer populations are suffering from via epigenetic methods, you can bet your bottoms that there will be a huge influx of capital in the sector and a subsequent increase in demand for computational biologists.Of course we could end up in a sort of quasi-understanding parallel to that of quantum mechanics and end up in a epigenetic limbo, but the general feeling is that of high hopes.

crb002over 10 years ago

Pair programming keeps it human, and transfers knowledge very well.TDD is about reproducibility of results, which is very in line with the scientific method. Benchmark tests will show you when your solutions are getting out of hand on performance.The sunk cost fallacy is a big problem. Moving to a new platform like HTML5/iOS/Android gives a short reprieve, but soon those proprietary code bases will age.The other big problem is that usually a smaller portion jobs goes towards management in flatter organizations. Managers want lots of layers for job security.Eric Meijer is right that small teams which are given narrow mission objectives instead of detailed requirements, and measure their problem domain instead of guessing, will be effective.I'm curious if a Fat-Tree model of management will take hold, <a href="http://en.wikipedia.org/wiki/Fat_tree" rel="nofollow">http://en.wikipedia.org/wiki/Fat_tree</a> You get a flatness that improves communication latency, lots of bandwidth, and managers are happy because there are a lot of jobs at the top.

tarikjnover 10 years ago

I think many in the comments are misunderstanding what the author mean to say in his post. He is not talking about working in academia or making scientific software. He talks about improving one's skills in basic science and such fields in computer science as A.I. which have historically been entrenched more in academia than industry.

评论 #8825098 未加载

评论 #8825122 未加载

placeboover 10 years ago

Mostly agree with the article. Being myself fascinated with machine learning and in the process of refreshing mathematic knowledge I haven't used since university (too many years ago) in order to dive deeper into it, I can definitely relate.However, I think the main point is not that software developers should all hone their academic math skills (that would be probably be pointless for many if not most software developers), but rather that it would be best if software developers would strive to follow the scientific mindset when developing software - In my experience, occam's razor is just as important in software development (design, architecture, algorithms, testing, you name it) as it is in physics, chemistry or other sciences and it is this aspect (which I feel is the most basic and most important) that gets lost sometimes in the noise of software development trends and fashions.

ameliusover 10 years ago

The problem with scientific software is that the market is so small.It is far more profitable to just write mainstream software.

评论 #8824507 未加载

评论 #8823993 未加载

评论 #8824742 未加载

评论 #8824230 未加载

gpczover 10 years ago

I'm having trouble understanding the definitions of these roles. I see the chart, but the terms are all vague to me. What does a data scientist do that a mathematician or scientist doesn't do, and what does a scientific programmer do that a data scientist doesn't do?My impression was that "data scientist" was a colloquialism for "statistician that knows how to program." Is a scientific programmer just a programmer that knows some statistics? Why is the direction important? The author says he/she feels that a programmer that knows statistics can make "more robust software" than the other way around, but what exactly does that mean? Do they mean "doesn't crash as much", or do they mean "gives the right answer more often?"

评论 #8825803 未加载

评论 #8826672 未加载

评论 #8828208 未加载

评论 #8826363 未加载

评论 #8824954 未加载

thyrsusover 10 years ago

Here's my attempt at a TL;DR:* The '90s "Access in 24 hours" programmer has been replaced by the latest anecdote-based technique/toolset preacher; e.g., TDD.* Becase deep learning is better than humans at finding useful patterns in data (whether concerning biochemistry or web site interaction) it is the best technique.* Aesthetic (e.g., language) and social justice (e.g., feminisim) issues distract from utilitarian effectiveness.* Utility is only furthered by math and science (where for "science" read "patterns inferred from data"), and we should aspire to be "scientific programmers" who apply only math and science.

评论 #8828502 未加载

评论 #8829812 未加载

sgt101over 10 years ago

A post about data science and scientific programming featuring a set of graphs with no y-axis scale and labels. At my gaff this kind of presentation of data leads to "scrap the whole analysis and start again".

评论 #8824301 未加载

vonnikover 10 years ago

Deeplearning4j and ND4J contributor here: We've created a distributed framework for scientific computing on the JVM, ND4J, which is the linalg library behind Deeplearning4j, which includes ConvNets and other deep-learning algorithms. Contrary to the author, we believe Scala is the future of scientific computing. While Python is a wonderful language, the optimization and maintenance of scalable programs often happens in C++, which not a lot of people enjoy.

评论 #8826571 未加载

samuellover 10 years ago

A look at the trends for "R programming", compared to "Python programming", is quite interesting too: <a href="http://www.google.com/trends/explore#q=R%20programming%2C%20python%20programming&cmpt=q" rel="nofollow">http://www.google.com/trends/explore#q=R%20programming%2C%20...</a>(Their curves are more or less parallel since 2011)

mollerhojover 10 years ago

I believe the author is right - This is the reason why I'm spending my final ECTS points on statistics and machine learning.

briantakitaover 10 years ago

What if we define intelligence, not from an anthropomorphic view, but from a systemic view; as all systems have intelligence.What is "Artificial Intelligence"? The opposite of "Natural Intelligence"?

评论 #8826433 未加载

iamwilover 10 years ago

What is deep learning currently applied to besides object recognition in images?

评论 #8825166 未加载

评论 #8824358 未加载

z3phyrover 10 years ago

So Machine Learning in general is an almost solved problem?

评论 #8826551 未加载

staredover 10 years ago

Plug: I made a list of software that is useful for scientists: <a href="https://gist.github.com/stared/9130888" rel="nofollow">https://gist.github.com/stared/9130888</a>.

michaelochurchover 10 years ago

First of all, I hate this "Agile" nonsense. I've seen it kill companies. It's truly awful, because it gives legitimacy to the anti-intellectualism that has infected this industry. It's that anti-intellectualism that, if you let it, will cause a rot of your mathematical and technical skills. Before you know it, you've spent five years reacting to Scrum tickets and haven't written any serious code, and your math has gone to the birds as well. It's insidious but dangerous, this culture of business-driven engineering mediocrity.I hope that it'll be the fakes and the brogrammers who get flushed out in the next crash. Who knows, though? Obviously I can't predict the future better than anyone else.To me, Python doesn't feel like a "scientific" language. Python's a great exploratory tool, and it's got some great libraries for feeling out a concept or exploring a type of model (e.g. off-the-shelf machine learning tools). That said, science values reproducibility and precision, which brings us around to functional programming and static typing... and suddenly we're at Haskell. (Of course, for a wide variety of purposes, Python is just fine, and may be a better choice because of its library ecosystem.) I do think that, as we use more machine learning, we're going to have a high demand for people who can apply rigor to the sorts of engineering that are currently done very quickly (resulting in "magic" algorithms that seem to work but that no one understands). I also agree that "deep learning" and machine learning in general are carrying some substance, even if 90% of what is being called "data science" is watered-down bullshit.I still don't feel like I know what a "scientific programmer" is, or should be. And I'd love to see the death of business-driven engineering and "Agile" and all the mediocrity of user stories and backlog grooming meetings, but there's nothing yet that has convinced me that it's imminent just yet. Sadly, I think it may be around for a while.

评论 #8825650 未加载

评论 #8825351 未加载

评论 #8825989 未加载