This is a great little paper -- with comments and rejoinder! I presented it to a reading group back when it appeared, I enjoyed it so much. Always worth re-reading because Breiman is such a hero of useful probabilistic modeling and insight.<p>One should remember that it is a reflection of its time, and the dichotomy it proposed has been softened over the years.<p>Another paper, more recent, and re-examining some of these same trends in broader context, is by David Donoho:<p><a href="https://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf" rel="nofollow">https://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSc...</a><p>Highly recommended. Pretty good HN comments at:<p><a href="https://news.ycombinator.com/item?id=10431617" rel="nofollow">https://news.ycombinator.com/item?id=10431617</a>
This is a great paper. Very long, but worth every bit of it. BTW, here is a recent blog post about the paper: <a href="http://duboue.net/blog27.html" rel="nofollow">http://duboue.net/blog27.html</a><p>One of the key insights I took away was the importance of using out-of-sample predictive accuracy as a metric for regression tasks in statistics—just like in ML. The standard best practices in STATS 101 is to compute R^2 coefficient (based on data of the sample), which is akin to reporting error estimates on your training data (in-sample predictive accuracy).<p>IMHO, statistics is one of the most fascinating and useful fields of study with countless applications. If only we could easily tell apart what is "legacy code" vs. what is fundamental... See this recent article <a href="https://www.gwern.net/Everything" rel="nofollow">https://www.gwern.net/Everything</a> the points out the limitations of Null Statistical Hypothesis Testing (NHST), another one of the pillars of STATS 101.
The title is a reference to this famous essay by C.P. Snow about a split between the humanities and science: <a href="https://en.wikipedia.org/wiki/The_Two_Cultures" rel="nofollow">https://en.wikipedia.org/wiki/The_Two_Cultures</a>
See also "50 years of Data Science" by David Donoho (2015), which discusses the question of whether there's any difference between "statistics" and "data science".<p><a href="http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf" rel="nofollow">http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSci...</a>
Two? Two?!<p>There is a classic post here <a href="https://news.ycombinator.com/item?id=10954508" rel="nofollow">https://news.ycombinator.com/item?id=10954508</a>:<p>"
The Geneticists: Use evolutionary principles to have a model organize itself
The Bayesians: Pick good priors and use Bayesian statistics<p>The Symbolists: Use top-down approaches to modeling cognition, using symbols and hand-crafted features<p>The Conspirators: Hinton, Lecun, Bengio et al. End-to-end deep learning without manual feature engineering<p>The Swiss School: Schmidhuber et al. LSTM's as a path to general AI.<p>The Russians: Use Support Vector Machines and its strong theoretical foundation<p>The Competitors: Only care about performance and generalization robustness. Not shy to build extremely slow and complex models.<p>The Speed Freaks: Care about fast convergence, simplicity, online learning, ease of use, scalability.<p>The Tree Huggers: Use mostly tree-based models, like Random Forests and Gradient Boosted Decision Trees<p>The Compressors: View cognition as compression. Compressed sensing, approximate matrix factorization<p>The Kitchen-sinkers: View learning as brute-force computation. Throw lots of feature transforms and random models and kernels at a problem<p>The Reinforcement learners: Look for feedback loops to add to the problem definition. The environment of the model is important.<p>The Complexities: Use methods and approaches from physics, dynamical systems and complexity/information theory.<p>The Theorists: Will not use a method, if there is no clear theory to explain it<p>The Pragmatists: Will use an effective method, to show that there needs to be a theory to explain it<p>The Cognitive Scientists: Build machine learning models to better understand (human) cognition<p>The Doom-sayers: ML Practitioners who worry about the singularity and care about beating human performance<p>The Socialists: View machine learning as a possible danger to society. Study algorithmic bias.<p>The Engineers: Worry about implementation, pipe-line jungles, drift, data quality.<p>The Combiners: Try to use the strengths of different approaches, while eliminating their weaknesses.<p>The Pac Learners: Search for the best hypothesis that is both accurate and computationally tractable.
"
God I hate this paper. Perhaps it was relevant at its time. But that was 18 years ago. The described dichotomy between the "two cultures" isn't nearly as pronounced, if it even exists, today. There are few statisticians today who adhere entirely to the "data modeling culture" as described by Breiman.<p>I'm surprised how often this paper continues to get trotted out. In my experience it seems to be a favorite of non-statisticians who use it as evidence that statistics is a dying dinosaur of a field to be superseded by X (usually machine learning). Perhaps they think if its repeated enough it will be spoken into existence?
Here is a previous discussion <a href="https://news.ycombinator.com/item?id=10635631" rel="nofollow">https://news.ycombinator.com/item?id=10635631</a>
I am not an expert and am still reading thru the article, but why is it such a strong dichotomy?
Don't all predictive algorithm also assume a data model? for example aren't hidden Markov models, by assuming constant transition probability make a data assumption?<p>To my ears (eyes?), this discussion resembles the transition from linear, euclidean geometry into the fractal realm.
If people liked this paper, I suggest reading "The Two Cultures" by CP Snow which is not as technical but more expansive, cultural and philosophical.
> Interpretability is a way of getting information. But a model does not have to be simple to provide reliable information about the relation between predictor and response variables; neither does it have to be a data model.
The goal is not interpretability, but accurate
information.