Gaussian Distributions Are Soap Bubbles

140 pointsby lebekover 7 years ago

14 comments

cs702over 7 years ago

Of course. As others here point out, the hypervolume inside an n-dimensional hypersphere grows as the nth power of a linear increase in radius. In high dimensions, tiny increases in radius cause hypervolume to grow by more than 100%. The concentration of hypervolume is always highest at the edge.[0]The theoretical tools (and intuitions) we have today for making sense of the distribution of data, developed over the past three centuries, break down in high dimensions. The fact that in high dimensions Gaussian distributions are not "clouds" but actually "soap bubbles" is a perfect example of this breakdown. Can you imagine trying to model a cloud of high-dimensional points lying on or near a lower-dimensional manifold with soap bubbles?If the data is not only high-dimensional but also non-linearly entangled, we don't yet have "mental tools" for reasoning about it:* <a href="https://medium.com/intuitionmachine/why-probability-theory-should-be-thrown-under-the-bus-36e5d69a34c9" rel="nofollow">https://medium.com/intuitionmachine/why-probability-theory-s...</a>* <a href="https://news.ycombinator.com/item?id=15620794" rel="nofollow">https://news.ycombinator.com/item?id=15620794</a>[0] See kgwgk's comment below.

评论 #15677019 未加载

评论 #15677217 未加载

评论 #15677777 未加载

woopwoopover 7 years ago

I was going to comment that what's going on here doesn't have much to do with the Gaussian distribution. In high dimensions, almost all of the volume of the unit ball is concentrated near the unit sphere. In the first comment, Frank Morgan makes the same remark, pointing out that you get the same effect with the uniform distribution on the unit cube in high dimensions.High dimensions are weird.

评论 #15676590 未加载

评论 #15676770 未加载

评论 #15677865 未加载

smallnamespaceover 7 years ago

Isn't the unsuitability of the high-dimensional Gaussian intimately related to the fact that for most realistic problem spaces, we actually believe there are really far fewer than the N >> 1 measured dimensions?A uniform Gaussian presupposes that the variates are either linearly orthogonal, or all have the same linear interaction with each other (in the case of fixed positive correlation).If your actual problem has dimension 20, but you've measured it with N dimensions, then that means there are strong interactions between your measured variates, and moreover the intervariate interactions do not have a single fixed interaction strength (like a single Gaussian correlation), but probably vary like a random matrix.This might be related to the Tracy-Widom[1] distribution somehow. Perhaps the the distribution you use to replace the Gaussian should really be something like: first generate a random positive semi-definite matrix as C, then generate random data based on different random choices of C.[1] <a href="https://en.wikipedia.org/wiki/Tracy%E2%80%93Widom_distribution" rel="nofollow">https://en.wikipedia.org/wiki/Tracy%E2%80%93Widom_distributi...</a>

tgbover 7 years ago

I won't dispute the main point of the article but a couple minor errors bug me. First, he kept referring to a Gaussian distribution as being the unit sphere, when of course the radius depends upon the parameters of the Gaussian (the standard deviation). If not, then it wouldn't be invariant under which units you chose. A bizarre mistake to repeat many times throughout the article.Less importantly, the last paragraph says that the probability that two samples are orthogonal is "very high". Being precisely orthogonal is technically a probability zero event. There author means "very close to orthogonal."There was a good discussion about this problem in the context of Monte Carlo simulations in (1).(1) <a href="https://arxiv.org/abs/1701.02434" rel="nofollow">https://arxiv.org/abs/1701.02434</a>

评论 #15677313 未加载

评论 #15677266 未加载

Bromsklossover 7 years ago

Those images [0] inputs that were optimised to maximise a certain classification response were cool! Instead of going to this peak of the response function, is there a way to explore the shell where the actual images reside? Would such images look, to our eyes, more like real input than the optimised input? I suspect they won't, but I still would like to see what is between the dogs![0] <a href="http://www.inference.vc/content/images/2017/11/Screen-Shot-2017-11-09-at-2.12.44-PM.png" rel="nofollow">http://www.inference.vc/content/images/2017/11/Screen-Shot-2...</a>

评论 #15676750 未加载

评论 #15676778 未加载

snippyhollowover 7 years ago

Compulsory "Spikey Spheres" notebook <a href="http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/syhw/9025964/raw/441645b476a2a997f27f5993e4da2988febe1ef3/SpikeySpheres" rel="nofollow">http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/...</a>

评论 #15677301 未加载

amlutoover 7 years ago

In information theory, there's a reflected concept of a "typical set", which is the set of sequences of samples from a distribution whose probability (or probability density) is very close to the expected probability. If you draw a sequence of samples, you are overwhelmingly likely to get a typical outcome as opposed to, say, anything resembling the most likely outcome.As a concrete example, if you have a coin that gets heads 99% of the time and you flip it 1M times, you are overwhelmingly likely to get around 10k tails, even though the individual sequences with many fewer tails are each far likelier than the typical sequences.

Scene_Cast2over 7 years ago

So I have a feeling that he's looking at the wrong histogram. If you plot the distributions of the vector magnitudes, you'll get a spike around whatever large number, and a very sharp falloff to the right and left.However, it's not a "bubble" in the intuitive sense. He's looking at the magnitude distribution of dots over the entire space, and implicitly using the Cartesian coordinate system (discarded angle, looking at just magnitude).If you look at the distribution of dots per volume (or R^N hyper-volume rather), then you'll still have the highest concentration in the center, with no "bubble".

strainerover 7 years ago

Maybe it goes without saying but what I found distinctive about gaussian distribution in multiple dimensions is that it seems to be the only distribution which produces a smooth radial pattern when plotted co-linearly (yet not radially). All other distributions which I tested exhibit a bias through the main axis when just a number of (variateA,variateB) pairs are plotted. Gaussian seems to be the only one , fundamentally, which shows no sign of the orientation of the axis it is plotted along.Comes in handy for plotting a radially smooth 'star cluster' without doing polar coordinates and trig. Just plot a load of (x=a_guass,y=another_gaus,z=another_gaus) and you have a radially smooth object. I dont think any other distribution can do that, it seems to me there is something mathematically profound about it which Im sure some mathemagicians have a proper grasp of.The 'co-linear' distortions of other distributions can be seen here in some plots in the test page for my random distribution lib:<a href="http://strainer.github.io/Fdrandom.js/" rel="nofollow">http://strainer.github.io/Fdrandom.js/</a>

bglazerover 7 years ago

I was recently reading the section on importance sampling in David MacKay's "Information Theory, Learning, and Inference Algorithms". Page 373-376 in the linked pdf (<a href="http://www.inference.org.uk/itprnn/book.pdf" rel="nofollow">http://www.inference.org.uk/itprnn/book.pdf</a>)He shows that importance sampling will likely fail in high dimensions precisely because samples from a high dimensional Gaussian can be very different than those from a uniform distribution on the unit sphere.Consider the ratio between a sample at the same point from a 1000D Gaussian and a 1000D uniform distribution over a sphere. If you sample enough times, then the median ratio and the largest ratio will be different by a factor of 10^19. Basically, most samples from the Gaussian will be fairly similar to the uniform. A few will be wildly different.Perhaps I'm misunderstanding both the post and MacKay's book. I'd be happy to be corrected.

评论 #15677619 未加载

srs70187over 7 years ago

This is an interesting take and kudos to the author for relaying a helpful way to think about high dimensional distributions.I really like and often come back to this talk by Michael Betancourt where the theme is quite similar: <a href="https://youtu.be/pHsuIaPbNbY" rel="nofollow">https://youtu.be/pHsuIaPbNbY</a>

andrewflnrover 7 years ago

This reminds me of the story summed up in this quote:<pre><code> There was no such thing as an average pilot. If you’ve designed a cockpit to fit the average pilot, you’ve actually designed it to fit no one. </code></pre> Good enough source here: <a href="http://wmbriggs.com/post/18291/" rel="nofollow">http://wmbriggs.com/post/18291/</a>Humans form a very high-dimensional space. I'm not sure what to make of the point about orthogonality in that regard.

brianjoseffover 7 years ago

Having trouble understanding a lot of the specifics of this--though broader concepts grok-able. Before I go blindly googling around to get up to speed--Any recommended foundational texts to begin with?Recommended learning trajectory to get to where this is understandable?