Modes, Medians and Means: A Unifying Perspective

332 点作者 meribold超过 7 年前

20 条评论

bowaggoner超过 7 年前

There is a much wider generalization here which is studied under the name of "property elicitation" in computer science, machine learning, and statistics.The generic question is: Given a loss function, what "property" of the distribution minimizes average loss; and given a "property", characterize all such loss functions. For example, Bregman divergences are (essentially) all losses that "elicit" the mean of a distribution. If you have any monotone continuous function g(), then |g(x) - g(s)| actually also elicits the median, and these are essentially all that do.Apologies for self-promotion, but you can read more at references on this page (disclaimer: I'm one of the researchers who posted it): <a href="https://sites.google.com/site/informationelicitation/" rel="nofollow">https://sites.google.com/site/informationelicitation/</a>or tutorials on this subject at my blog: <a href="http://bowaggoner.com/blog/series.html#convexity-elicitation" rel="nofollow">http://bowaggoner.com/blog/series.html#convexity-elicitation</a>

评论 #15951426 未加载

FabHK超过 7 年前

One small extension:* L_0 -> mode* L_1 -> median* L_2 -> mean* L_infinity -> midrange, I thinkthat is, (smallest observation + largest observation)/2(BTW, the author is also a big contributor to the wonderful Julia language, I believe)

评论 #15949073 未加载

评论 #15950856 未加载

评论 #15947623 未加载

评论 #15951644 未加载

Scene_Cast2超过 7 年前

I should note that square-error loss is not the only one that gives the average. Log-loss is another one, for example (you can prove it by taking the derivative of your minimization, setting it to zero and solving)

jdonaldson超过 7 年前

These relationships are pretty clear once you see the other distributional metrics. Going further there's skewness and kurtosis for the third and fourth statistical moments, resp.

评论 #15947545 未加载

评论 #15955420 未加载

评论 #15946467 未加载

enriquto超过 7 年前

there is so much unexplained beauty in this text!notice that the unifying perspective depends on a continuous parameter p, which is 2 for the mean, 1 for the median and -infinty for the mode. Thus, there is a continuous family of statistics that interpolate between these three things!he does not mention it neither, but this means that you can define modes of a continuous variable (without resorting to histogram bins)

评论 #15947594 未加载

saurik超过 7 年前

For mode to fit into this unifying framework, this article assumes that 0^0 is not indeterminate and is instead simply 0 (instead of the more usual assumption of 1).<a href="https://en.m.wikipedia.org/wiki/Zero_to_the_power_of_zero" rel="nofollow">https://en.m.wikipedia.org/wiki/Zero_to_the_power_of_zero</a><a href="http://mathforum.org/dr.math/faq/faq.0.to.0.power.html" rel="nofollow">http://mathforum.org/dr.math/faq/faq.0.to.0.power.html</a>

评论 #15947440 未加载

评论 #15947549 未加载

评论 #15947415 未加载

FabHK超过 7 年前

Mods, could you maybe put (2013) in the title? Content is timeless, but just as a heads up that one might have come across it before.

ganonm超过 7 年前

I remember introducing someone to the concept of variance in a set of data and I used a very similar approach. Variance seems like an arbitrary (but obvious) definition but in fact it can be derived from first principles by just looking for the simplest possible function that firstly has some dependency on the difference between values and the arithmetic mean, secondly has the property that it is independent of whether the differences are positive or negative (to the right or left of the mean) and thirdly that it does not depend on the size of the data set (i.e. duplicating each member of a data set would leave the variance unaffected). When you consider each of these, the equation for variance arises very naturally.Arithmetic difference satisfies for first propertySquaring each difference satisfies the second propertyTaking the arithmetic mean satisfies the third propertyVar(X) = E[(X - mean)^2]

评论 #16041580 未加载

joshgel超过 7 年前

While I loved this post, could have also been helpful to include more info about Pythagorean Means (geometric mean and harmonic mean). Worth looking into depending on the type of variability seen in your data.EDIT: just saw this was mentioned in one of the comments...

评论 #15947157 未加载

arnioxux超过 7 年前

The Generalized mean[1] linked to in the blog comments was similarly insightful.It unifies the inequalities:max > root mean square > arithmetic mean > geometric mean > harmonic mean > minthat I remember from highschool math competitions.[2][1] <a href="http://en.wikipedia.org/wiki/Power_mean#Special_cases" rel="nofollow">http://en.wikipedia.org/wiki/Power_mean#Special_cases</a>[2] <a href="https://artofproblemsolving.com/wiki/index.php?title=Root-Mean_Square-Arithmetic_Mean-Geometric_Mean-Harmonic_mean_Inequality" rel="nofollow">https://artofproblemsolving.com/wiki/index.php?title=Root-Me...</a>

评论 #15947191 未加载

RoboTeddy超过 7 年前

Is there any fundamental reason to measure discrepancy by abs(s - x_i)^2 rather than say by abs(s - x_i)^1.5? Is something special about 2 in this context, or is it just a social convention that seems to work pretty well?

评论 #15947442 未加载

评论 #15947563 未加载

评论 #15947453 未加载

评论 #15947475 未加载

评论 #15947500 未加载

评论 #15947562 未加载

mr_toad超过 7 年前

If your loss function is an actual financial $ loss (or revenue), then arithmetic-mean times n gives the best estimate of total/long-run expected loss.If the distribution of losses is skewed or has outliers then estimates other than the mean (median, trimmed means etc) often under-estimate total losses.Under-estimating total losses in the long run could be very bad for business.

jaddood超过 7 年前

In case anyone wants to dig into the follow up on Lp norms, here it is: <a href="http://www.johnmyleswhite.com/notebook/2013/03/22/using-norms-to-understand-linear-regression/" rel="nofollow">http://www.johnmyleswhite.com/notebook/2013/03/22/using-norm...</a>

known超过 7 年前

Painfully, American families are learning the difference between median and mean<a href="https://qz.com/260269/painfully-american-families-are-learning-the-difference-between-median-and-mean/" rel="nofollow">https://qz.com/260269/painfully-american-families-are-learni...</a>

alvis超过 7 年前

The beauty of math is often missed. That's why we say maths is an art!

sytelus超过 7 年前

Is there similar generalization for geometric mean and geometric median?

kensai超过 7 年前

"To sum up, we’ve just seen that the three most famous single number summaries of a data set are very closely related: they all minimize the average discrepancy between sand the numbers being summarized. They only differ in the type of discrepancy being considered:<pre><code> The mode minimizes the number of times that one of the numbers in our summarized list is not equal to the summary that we use. The median minimizes the average distance between each number and our summary. The mean minimizes the average squared distance between each number and our summary."</code></pre>

评论 #15946671 未加载

评论 #15946676 未加载

评论 #15946541 未加载

ycmbntrthrwaway超过 7 年前

Math does not render unless I allow cloudflare.com to execute scripts. Why can't we just self-host scripts, is it that hard?

评论 #15949118 未加载

评论 #15946943 未加载

moomin超过 7 年前

On my phone, this entire article reads “blah blah blah [Math processing error] blah blah blah [Math processing error] blah blah blah [Math processing error] blah blah blah [Math processing error]”

评论 #15948090 未加载

评论 #15946710 未加载

评论 #15946677 未加载

vorg超过 7 年前

To get the median of an even number of values, you must calculate the mean of the middle two values. Therefore the definition of the median relies on the mean already being defined when working with a discrete number of values, which isn't really explained in the post.In fact, there's a whole spectrum of averages defined with mean and median on each end, depending on how many outliers you eliminate. For example, if you have eight numbers, you can define a spectrum of four averages:<pre><code> 2,3,5,7,11,13,17,19 // mean, here 9.6250 3,5,7,11,13,17 // mean with outlier on each side stripped, here 9.3333 5,7,11,13 // mean of central two quartiles, here 9.0000 7,11 // median (i.e. mean of center two numbers), here 9.0000 </code></pre> You could then repeat the process on that spectrum of averages to get a shorter spectrum, here [9.2396 (mean), 9.1667 (median)], recursively until you have one "mean-median" left, here 9.2031.I wonder how this fits in with the explanation in the post.

评论 #15947187 未加载

评论 #15947607 未加载

评论 #15947291 未加载

评论 #15947025 未加载

20 条评论

bowaggoner超过 7 年前

评论 #15951426 未加载

FabHK超过 7 年前

评论 #15949073 未加载

评论 #15950856 未加载

评论 #15947623 未加载

评论 #15951644 未加载

Scene_Cast2超过 7 年前

jdonaldson超过 7 年前

These relationships are pretty clear once you see the other distributional metrics. Going further there's skewness and kurtosis for the third and fourth statistical moments, resp.

评论 #15947545 未加载

评论 #15955420 未加载

评论 #15946467 未加载

enriquto超过 7 年前

评论 #15947594 未加载

saurik超过 7 年前

评论 #15947440 未加载

评论 #15947549 未加载

评论 #15947415 未加载

FabHK超过 7 年前

Mods, could you maybe put (2013) in the title? Content is timeless, but just as a heads up that one might have come across it before.

ganonm超过 7 年前

评论 #16041580 未加载

joshgel超过 7 年前

评论 #15947157 未加载

arnioxux超过 7 年前

评论 #15947191 未加载

RoboTeddy超过 7 年前

评论 #15947442 未加载

评论 #15947563 未加载

评论 #15947453 未加载

评论 #15947475 未加载

评论 #15947500 未加载

评论 #15947562 未加载

mr_toad超过 7 年前

jaddood超过 7 年前

known超过 7 年前

alvis超过 7 年前

The beauty of math is often missed. That's why we say maths is an art!

sytelus超过 7 年前

Is there similar generalization for geometric mean and geometric median?

kensai超过 7 年前

评论 #15946671 未加载

评论 #15946676 未加载

评论 #15946541 未加载

ycmbntrthrwaway超过 7 年前

Math does not render unless I allow cloudflare.com to execute scripts. Why can't we just self-host scripts, is it that hard?