TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Beware the Mean

94 点作者 sjwhitworth超过 5 年前

9 条评论

kqr超过 5 年前
Other, related points:<p>- With heavy tails, the sample mean (i.e. the number you can see) is very likely to underestimate the population mean.<p>- With heavy enough tails, higher moments like variance (and therefore standard deviation) do not exist at all -- they&#x27;re infinite.<p>- Critically: With heavy tails, the central limit theorem breaks down. Sums of heavy-tailed samples converge to a normal distribution so slowly it might not realistically ever happen with your finite data. Any computation you do that explicitly or implicitly relies on the CLT will give you junk results!
评论 #22167089 未加载
评论 #22168523 未加载
评论 #22167189 未加载
ImaCake超过 5 年前
If the author is seeing this thread; I couldn&#x27;t find an RSS feed for your site. I don&#x27;t know if they are difficult to setup, but if it&#x27;s very little effort, I would appreciate seeing what you post next :)<p>As for the waryness about the mean. A lot of people much further behind than thinking of different distributions. Even something you assume is normal distributed needs a mean <i>and</i> a variance! As for visualising, histograms are incredibly underrated tools. You can infer a lot of information by just looking at a distribution.
评论 #22168245 未加载
评论 #22167761 未加载
EliRivers超过 5 年前
The mean is misleading. The median is misleading. The mode is misleading. Any reduction of a range of data to a single representative datum is misleading.<p>However, the fight back against providing something a bit more meaningful than a single value can sometimes be quite strong.<p>I try hard to provide software estimates as probability distributions, but when someone sees a line with a probability peak somewhere around two days (could be really simple), and then a wide hump somewhere around two weeks (if it&#x27;s not simple, it will mean a significant rewrite), with a very low line between them and then a long, long tail off to several months, it is not well-received.<p>I can see their point; they&#x27;re trying to plan things, and the whole system is set up to work with single numbers. If everyone provided probability graphs for their estimates, and we had a tool that could then combine them and deliver the net probability graph of the combined pieces, I expect they&#x27;d be a lot more amenable.
评论 #22169380 未加载
评论 #22167959 未加载
评论 #22167955 未加载
teodorlu超过 5 年前
Nassim Taleb greatly expands on this point in <i>Antifragile</i>. For a freely available, techical argument, check out <i>Doing Statistics Under Fat Tails</i>[1].<p>[1]: <a href="https:&#x2F;&#x2F;www.fooledbyrandomness.com&#x2F;FatTails.html" rel="nofollow">https:&#x2F;&#x2F;www.fooledbyrandomness.com&#x2F;FatTails.html</a>
theophrastus超过 5 年前
This is a worthy posting, particularly as so much becomes iterative statistics in &quot;A.I.&quot; clothing. The two old (slightly hackneyed) counter-examples which are popular in lectures about measures of the <i>central tendency</i> are:<p>- One is trying to get a sense of the common sort of income in a room and then Bill Gates wanders in. Suddenly the average income becomes an amount which <i>no one</i> experiences.<p>- What is the average number of testicles in the human population? That computed central tendency is quite rare.
评论 #22171766 未加载
mlyle超过 5 年前
I would quibble some here. When we look at revenue, I agree: ignore the mean. If there&#x27;s a whole bunch of people not paying you anything, that&#x27;s OK... Look at the 50th and 90th percentile.<p>But <i>profit</i>, and similarly <i>costs</i>? Your mean customer better be profitable, or you won&#x27;t be. How much the people on the left of the graph <i>cost</i> you is <i>important</i>.<p>Part of this is definitional, too. Do you include that far left part of the graph where people are not really paying you as a &quot;customer&quot;?
评论 #22167579 未加载
评论 #22169455 未加载
PaulHoule超过 5 年前
The mean is not so bad for many purposes because it is an expectation value.<p>If you add up your revenue, subtract your expenses, and divide by the number of customers that gives you a real profit number. (Condition how you define revenue &amp; expenses) If that number is negative or positive it is meaningful.<p>The median on the other hand has a different set of problems. If you are running a game like Fate Grand Order you&#x27;d better cultivate the guy who spends $70k because he has to &quot;catch them all&quot;. The median player probably pays little or nothing, but the guy who sells ero comics at Comiket complains about what it costs to get (say) Saber Bride, but it is worth more to him than it is to the medium.<p>Mean and median are terrible numbers to use for latency; what drives you nuts with your computer being unresponsive is not the median latency, but the 99% latency.
评论 #22171837 未加载
pototo666超过 5 年前
I came across too many people who value mean soooooo much in the analysis. Well, some of them made mistake and the project died. Hypothesis: heavy reliance on mean increases the probability of failure in internet industry. This reminds of PG&#x27;s essay <i>mean people fail</i>: <a href="http:&#x2F;&#x2F;www.paulgraham.com&#x2F;mean.html" rel="nofollow">http:&#x2F;&#x2F;www.paulgraham.com&#x2F;mean.html</a><p>Pun intended :)
fmajid超过 5 年前
The Iranian civilization can draw continuity to Susa, circa 3000BC, further than China. The Mesopotamian and Indian civilizations are older still but broke continuity.
评论 #22167208 未加载