TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Most Dangerous Equation [pdf]

184 pointsby jfriedlyover 12 years ago

15 comments

guylhemover 12 years ago
Absolutely totally true.<p>Ignore the laws of statistics at your own peril.<p>Knowledge of statistics is the best investment one can make with one's time. A/B testing is the tip of the iceberg - a simple practical application, which can be exploited much better with statistics.<p>Learn about statistical distributions - Bernouilli, Binomial, Normal, Poisson and Hypergeometric at least, then Chi distribution to grasp <i>real</i> tests like chi2 goodness of fit, 2 way, calculating intervals - and you will see so many possible daily applications and avoid so many pitfalls (such as the ones outlined in the article)<p>BTW I saw in another comment something about significance. DeMoivre is not directly related to significance- it only means that if you know the population standard deviation, the standard deviation of any sample extracted from the population will depend on the size of the given sample - ie smaller samples will go below and above the expected value much more often<p>Consequently, if you try to deduce the SD of a population using a sample, the bigger sampler will give the best results (ie smaller or more accurate intervals)
评论 #4893645 未加载
评论 #4893607 未加载
JDDunn9over 12 years ago
While statistics are certainly abused an misunderstood, I don't think that small sample sizes are the worst problem. The media usually reports margins of error, and most people know small samples may not represent the population.<p>I think a much larger problem is in the underlying assumptions that are made. For instance, assuming that an experiment on animals can be applied to humans (sometimes it can, sometimes it can't). These can be more nuanced and much harder to detect than a simple math error.<p>Also, the importance of truly random sampling is not emphasized enough. Even medical researchers are guilty of using international cluster sampling to make generalizations about the population. Overlooking sources of bias like geography, culture, lifestyle differences, etc.
评论 #4894030 未加载
grannyg00seover 12 years ago
Excellent. Reminds me of the lecture describing that "the greatest shortcoming of the human race is our inability to understand the exponential function"<p><a href="http://youtu.be/F-QA2rkpBSY" rel="nofollow">http://youtu.be/F-QA2rkpBSY</a>
评论 #4894027 未加载
confluenceover 12 years ago
Ah hello law of small numbers, survivorship bias and fundamental attribution error - we meet again. If you understand these 3 biases - you'll wonder what kind of world you have actually been living in all this time.<p>I'm going to repost a my comment about this very concept as related to startups from a while ago because I believe HNers will appreciate it - it's from an article called "Startup School And Survivor Bias" (hope that's ok :)<p>Source: <a href="http://news.ycombinator.com/item?id=4685042" rel="nofollow">http://news.ycombinator.com/item?id=4685042</a><p>============================================================<p>Startups: never have so many understood so little about the statistics of variance present in the outcomes of small samples.<p>People like to speak of 10x productivity, non-stop work and geniuses - but the reality is much less interesting. A large number of small teams working on many different problems will by definition have a great variance in outcomes just by random extraneous factors (also known as the law of small numbers and insensitivity to sample size).<p><i>&#62; A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower.<p>For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?<p>1) The larger hospital<p>2) The smaller hospital<p>3) About the same (that is, within 5% of each other)<p>56% of subjects chose option 3, and 22% of subjects respectively chose options 1 or 2. However, according to sampling theory the larger hospital is much more likely to report a sex ratio close to 50% on a given day than the smaller hospital.<p>Relative neglect of sample size were obtained in a different study of statistically sophisticated psychologists</i><p>-- <a href="http://en.wikipedia.org/wiki/Insensitivity_to_sample_size" rel="nofollow">http://en.wikipedia.org/wiki/Insensitivity_to_sample_size</a><p><i>&#62; A deviation of 10% or more from the population proportion is much more likely when the sample size is small. Kahneman and Tversky concluded that "the notion that sampling variance decreases in proportion to sample size is apparently not part of man's repertoire of intuitions. For anyone who would wish to view man as a reasonable intuitive statistician such results are discouraging."</i><p>-- <a href="http://www.decisionresearch.org/pdf/dr36.pdf" rel="nofollow">http://www.decisionresearch.org/pdf/dr36.pdf</a><p>Taking lessons as gospel from these "10x" events is by definition foolhardy and merely an extension of the bullshit pushed by the entire "Good To Great" Jim Collins business book industry.<p>It's like taking lessons from survivors of the Titanic on how to survive the sinking of a ship. It's quite simple - be a young female child with a life vest and rich parents (or in startup land - a young upper-middle class male living in California during a venture bubble, a cyclical investment in the Valley with a convergence of secondary technologies, above average intelligence and a college degree from a reputable university).<p>I have a personal rule with any kind of advice or explanation coming out of anyone working in a "soft" industry - if it's vague - it's bullshit. All of the advice given at these events are bullshit by this definition. So are many other things - and yeah it doesn't preclude me from spouting it. Or using the advice at my discretion.<p>But honestly - startup founders literally have no idea why things take off and they have no idea why they win. That's why they have to keep pivoting - it increases their luck surface area and their ability to gain traction - after which they simply must hold on tight while surfing the wave.<p>YouTube was a dating site - didn't work - pivot - video traction - venture up - ride.<p>PayPal was a Palm Pilot app - didn't work - pivot - traction - venture up - ride.<p>Google sold corporate search - didn't work - pivot - copy PPC from Overture - lever up - traction hits - ride.<p>Instagram - started with a location checking HTML5 app 2 years too early - pivot - copy PicPlz and Hipstamatic - hit traction - lever up - ride.<p>Angry Birds - fail at hitting nearly every game in the past decade - pivot - take a shot at the iPhone - hits traction - lever up - ride.<p>Of the startups that didn't pivot - they either skipped the pivot thanks to previous side projects/companies or already had traction - and all they had to do was lever up and ride.<p>I'm going to make this clear - there is absolutely, positively nothing wrong with this - not at all - it is merely reality and not particularly unfair.<p>People stating pointless platitudes that success is due to things like "Be 10x more productive", "Commitment" and "People, product, and philosophy" are simply wasting their breath, other people's time and confusing what actually happens. These things may or not be either actionable, predictive or sufficient for success.<p>Here's my list of startup advice:<p>Be alive. Be male. Be young. Don't have health issues. Be born in America or move there. Enter the cycle after a recession. Speak English. Enter a growing/new field where the level of competition is low and so is the sophistication of your competition. Surf cost trends down from expensive to mass consumer markets. Work bottom up - on small things. Be of above average intelligence. Have family support. Have a college degree.<p>Oh and most importantly of all: Get fucking lucky.<p>The hindsight/survivorship biases in combination with faulty causality and the narrative fallacy will completely hose your thinking - so be careful.<p>More interesting stuff:<p><a href="http://en.wikipedia.org/wiki/List_of_biases_in_judgment_and_decision_making" rel="nofollow">http://en.wikipedia.org/wiki/List_of_biases_in_judgment_and_...</a><p><a href="http://en.wikipedia.org/wiki/Black_swan_theory" rel="nofollow">http://en.wikipedia.org/wiki/Black_swan_theory</a><p><a href="http://en.wikipedia.org/wiki/List_of_fallacies" rel="nofollow">http://en.wikipedia.org/wiki/List_of_fallacies</a><p><a href="http://en.wikipedia.org/wiki/List_of_memory_biases" rel="nofollow">http://en.wikipedia.org/wiki/List_of_memory_biases</a><p><a href="http://www.econ.yale.edu/~shiller/behfin/2000-05/rabin.pdf" rel="nofollow">http://www.econ.yale.edu/~shiller/behfin/2000-05/rabin.pdf</a><p>Disclaimer: Biases rule your thoughts and mine - this post is also subject to both bullshit and biases (mostly bullshit - I do love that word). Think for yourself.
评论 #4895953 未加载
评论 #4894217 未加载
评论 #4897289 未加载
评论 #4894188 未加载
评论 #4894136 未加载
评论 #4894346 未加载
6renover 12 years ago
The last section on sex differences is interesting. It explains boys having greater variation in ability than girls by boys having only one X chromosome (XY) while girls have two (XX).<p>This would be a neat theory, if girls somehow used an average of the two X's... which seems compellingly logical, though the (current) theory is that only one X is used, chosen at random. <a href="http://en.wikipedia.org/wiki/Barr_body" rel="nofollow">http://en.wikipedia.org/wiki/Barr_body</a>
评论 #4894813 未加载
评论 #4895175 未加载
timmcleanover 12 years ago
I'm fascinated by the theory of increased variability in males being caused by brain-related genes in the X chromosome. I'd highly recommend checking out pages 18 and 19.
评论 #4893660 未加载
tokenadultover 12 years ago
This book chapter is an interesting read. It illustrates the importance of considering sample size, especially, when looking at preliminary research findings.<p>After looking up the book from which this chapter is excerpted, I followed other recommendations from Amazon to another very useful book,<p><a href="http://www.amazon.com/When-Can-You-Trust-Experts/dp/1118130278/" rel="nofollow">http://www.amazon.com/When-Can-You-Trust-Experts/dp/11181302...</a><p>When Can You Trust the Experts: How to Tell Good Science from Bad in Education by Daniel T. Willingham, a very astute psychologist with an interest in education policy.
FrojoSover 12 years ago
quote: "Obviously, they assumed that variability decreased proportionally to the number of coins and not to its square root."<p>Why is this so important? The fact, that the variability increases with smaller sample size was ignored completely by the protagonists in the provided examples. Realizing weather this inverse effect is linear or not doesn't seem to be the main problem in peoples intuition.<p>disclaimer: I have poor understanding of statistics.
评论 #4896888 未加载
mturmonover 12 years ago
Nice exposition and examples. Some of the most subtle and surprising phenomena I've seen in looking at stochastic data have been due to sampling effects.
Kynlynover 12 years ago
I have no affiliation with Code School, but I saw that they recently offered a free course on R, which is a programming language built around statistics.
hn-miw-iover 12 years ago
Really interesting paper but the use of comic sans on the axes labels is a turn off. Why comic sans?? Why? It's a crime against fontology.
j2kunover 12 years ago
Equations aren't dangerous. People who make policy decisions about things they don't understand are.
评论 #4893584 未加载
K2hover 12 years ago
so is the conclusion to be that statistically significant sample size is as important as the 'result' when measuring standard deviation?<p><a href="http://en.wikipedia.org/wiki/Sample_size_determination" rel="nofollow">http://en.wikipedia.org/wiki/Sample_size_determination</a>
评论 #4893505 未加载
turbulentsover 12 years ago
I just started skipping through it once I saw the Comic Sans.
tlarkworthyover 12 years ago
solution: quasi random sampling