To respond to a bunch of other posters here:<p>There's a fundamental difference between data scientist and statistician, I think. I see statistics as an academic discipline and data science as an applied discipline.<p>More concretely, the statistics approach is:
formulate question --> formulate hypothesis --> collect data in a controlled environment under a specific set of assumptions (i.e., perform an experiment) --> determine probability of the data given the hypothesis (and assumptions).<p>While the data science approach is:
hey look, we already have all this data --> generate predictions --> collect more data --> refine predictions.<p>Of course, that's an over-generalization. But I think the different emphasis on hypothesis testing vs. machine learning/data mining is fundamental.
> "I think data scientist is a sexed-up term for a statistician."<p>As a statistician-and-engineer who is currently on the job market (my graduate program finishes this spring), I feel this pain.<p>I've been referred to as a "data scientist" multiple times (that's even been my official title at work before), though I do still cringe sometimes when I hear the word, for this exact reason.<p>That said, I don't usually present myself as a statistician, even though my degree is a statistics degree. Most people who hold statistics degrees are fairly lousy engineers[0], and I don't know of any other term that (concisely) expresses that I'm equally competent as a statistician and a (backend) engineer[1].<p>Of course, this is because many of these programs haven't yet caught up to the fact that computers exist and are still teaching statistics as if we're in a pre-computation era. The perfect solution is to fix this, and thereby fix the connotation of the word "statistician".<p>It's the same reason I dislike the term "growth hacker" - really, that's just the way marketing <i>should</i> be done (ie, based on numbers and verifiable statistics). In a perfect world, all (competent) marketers would be "growth hackers". But many marketers aren't, and so we have to make up another cringe-worthy term for it.<p>Unfortunately, that's a problem that's beyond my means to solve. So I bite my tongue and add the word "data scientist" to my resume anyway.<p>[0] Usually self-proclaimed, too
[1] ie, "I could work as a backend engineer if I wanted to/needed to, but I'm looking for work involving both skillsets"
"I think data scientist is a sexed-up term for a statistician."<p>This statement, given by Silver to the annual meeting of the Joint Statistics Meetings (the main cross-organization stats conference), was guaranteed to be a crowd-pleaser for that audience.<p>Unfortunately for them, it's not really true.<p>The problem is that much of conventional academic statistics consists of proving theorems about model classes. This requires a lot of sophisticated analysis, but has turned rather vacuous. And much conventional applied statistics consists of computing diagnostics based on dubious modeling assumptions. Under pressure in the last 20 or so years from computer science, machine learning, computer vision, Moore's law, and the data avalanche, the discipline has changed, but not fast enough.<p>As a result, a lot of what <i>should</i> be taught and researched in statistics departments has been co-opted by these other disciplines. And many people with a real problem would rather work with a "machine learning" person than a "statistics" person.<p>The best summary of this state of affairs is Leo Breiman's essay (<a href="http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?handle=euclid.ss/1009213726&view=body&content-type=pdf_1" rel="nofollow">http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?ha...</a>). The abstract of this essay is brutal:<p>"There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large, complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools."<p>Breiman was mathematically sophisticated, so it's not that he wasn't able to follow the theory he critiques, it's that he wasn't snowed by detail and could see its lack of relevance to real problems.
If you're a trained scientist, 'Data Science' sounds distinctly odd. What other kind of science is there? A friend likened it to going to a restaurant to do some 'Food Eating'.
The way I have made personal peace with this is that I consider myself a better programmer than the median statistician and better in statistics and machine learning than a median programmer. Whether this is a useful spot to be in I have to find out. I can see that depending on the times this can either be an asset or a liability.
We need a new term for statistician and data scientist is as good as any. For many years, the terms "statistics" and "statistician" have had negative undertones within the general public and renaming is a great way to overcome that.
It could be argued that we shouldn't abandon the term "statistics." Data science is to statistics what mathematics is to physics, but we don't (nor do physicists) call it "number science."