A less popular but perhaps more influential phenomenon is Stein's Paradox [1]. Here's a provocative example often given to illustrate it: Say you have a baseball player, soccer player, and football player, and you wish to estimate the true mean number of home runs, goals, and touchdowns each scores per year. If you have their last ten seasons worth of data for each, then the obvious thing to do, for each player, is to estimate the true yearly mean score for each player by their average yearly score from the last ten years. (E.g., the baseball player hits an average of 20 home runs each year, so let's estimate their true mean yearly home runs by 20). Stein's Paradox says that you can actually do a lot better than this.<p>Even more crazy, the James-Stein Estimator which does this actually uses data about the football player and soccer player to make predictions about the baseball player, (and vice-versa). This is deeply unintuitive to most people since the players aren't related to each other at all. The phenomenon only holds with at least three players; it doesn't work for two.<p>(More generally, Stein's Paradox is the fact that if you have p >= 3 independent Gaussians with a known variance, you can do better in estimating their p-dimensional mean than just using their sample means).<p>I've spent a bunch of time trying to understand why this actually works [2]; to be honest I still don't deeply understand. But nonetheless the consensus is that the same shrinkage phenomenon is what causes improved performance for a variety of high-dimensional estimators, (lasso or ridge regression, e.g.), making the paradox very very influential.<p>[1] <a href="https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator" rel="nofollow">https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator</a>
[2] <a href="https://www.naftaliharris.com/blog/steinviz/" rel="nofollow">https://www.naftaliharris.com/blog/steinviz/</a>
My favorite probability paradox has always been the Monty Hall problem[1]:<p><i>Suppose you're on a game show, and you're given the choice of three doors:</i><p><i>Behind one door is a car; behind the others, goats.</i><p><i>You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat.</i><p><i>He then says to you, "Do you want to pick door No. 2?"</i><p><i>Is it to your advantage to switch your choice?</i><p>[1] - <a href="https://en.wikipedia.org/wiki/Monty_Hall_problem" rel="nofollow">https://en.wikipedia.org/wiki/Monty_Hall_problem</a>
No need to look at fancy paradoxes, just think about the following.<p><i>What does it mean that tossing a fair coin has a 50 % probability of showing heads?</i><p>If you think you know the answer, you are probably wrong.<p>EDIT: Instead of just voting this down, try to give an answer. If you think it is easy, you have not thought about it careful enough.
By far the most unintuitive paradox for me personally is the one presented here: <a href="https://youtu.be/go3xtDdsNQM?t=3m27s" rel="nofollow">https://youtu.be/go3xtDdsNQM?t=3m27s</a><p>"Mr. Jones has 2 children. What is the probability he has a girl if he has a boy born on Tuesday?" Somehow knowing the day of the week the boy was born changes the result. It's completely bizarre.
I think there is a whole class of statistical "strangeness" with using p values for hypothesis testing. For instance, p = 0.05 means that we have ~30% chance that our hypothesis is a false positive [1], which is far from what intuition tells us.<p>[1] <a href="http://www.nature.com/news/scientific-method-statistical-errors-1.14700" rel="nofollow">http://www.nature.com/news/scientific-method-statistical-err...</a>
If you throw a six sided dice two times, there's 1/6*1/6=~3% chance to hit six both times. But if you throw one six, there's now ~17% change to hit six again ...
my favorite data science paradox is the "curse of dimensionality"<p><a href="https://en.wikipedia.org/wiki/Curse_of_dimensionality" rel="nofollow">https://en.wikipedia.org/wiki/Curse_of_dimensionality</a>