TechEcho

10 comments

lukegoalmost 8 years ago

What a beautiful presentation!Tangentially: I am really enjoying the book "All of Statistics" as a reference for better understanding things like histograms, kernel density functions, etc, and their parameters.<a href="https://www.amazon.com/All-Statistics-Statistical-Inference-Springer/dp/0387402721" rel="nofollow">https://www.amazon.com/All-Statistics-Statistical-Inference-...</a>

评论 #14826434 未加载

vanderZwanalmost 8 years ago

If you're interested in histograms, I highly recommend "Expressing complex data aggregations with Histogrammar" by Jim Pivarski, where he talks about how for decades histograms have been used in unique ways to do amazing things in high energy physics (HEP, arguably the original Big Data field in compsci):<a href="https://www.youtube.com/watch?v=mB4Chl0ly-g" rel="nofollow">https://www.youtube.com/watch?v=mB4Chl0ly-g</a>One thing early on is that HEP histograms treats histograms as a kind of accumulator that can stream in data (because the amount of data processed was typically too big to load into RAM all at once), instead of a chart. From that starting point you can add, divide, multiply histograms with histograms to build crazy things.The results are no longer really histograms of course, but it's fun to see how something that we just think of as a chart can be (ab)used like that.

jtxx000almost 8 years ago

Kernel density plots should be preferred to histograms in nearly all cases. Histograms can be seen as a kernel density plot with a uniform kernel that has been sampled. Since a kernel density plot with a uniform kernel has unbounded frequency content, this sampling introduces aliasing, which is why you get all of these strange effects when adjusting the bin width and offset. In fact, if the distribution of your data happens to be a sine wave, then the histogram will also be a sine wave, but, due to aliasing, it may have a different frequency and phase.For a kernel density plot with a Gaussian kernel, the kernel size does effect the result, but the situation is much better than with histograms for two reasons:1. The kernel density plot varies smoothly as the kernel size changes, and so there is greater confidence that you have seen the whole story by only looking at a few kernel sizes.2. You can construct a kernel density plot with a larger kernel given only a kernel density plot with a smaller kernel. Since the convolutions of two Gaussians produces a new Gaussian with a variance equal to the sum of the input variances, you only have to convolve the small-kernel plot with another Gaussian to produce the large-kernel plot. This, again, means that you have more confidence that you've seen the whole story by looking at only a few kernel sizes.As a side note, there is technically a 1:1 relationship between 1D datasets and kernel density plots with a Gaussian kernel, and so in theory you don't lose any information by constructing the kernel density plot. In practice, however, you do lose information due to limited precision.

svaraalmost 8 years ago

When you think you want to plot a histogram, it's often a better idea to plot a (empirical) cumulative distribution [0] instead. You don't have to worry about how to select your bin limits and you can usually put several in the same plot for comparison without making it unreadable due to overlap.[0] <a href="https://en.wikipedia.org/wiki/Empirical_distribution_function" rel="nofollow">https://en.wikipedia.org/wiki/Empirical_distribution_functio...</a>.

评论 #14827823 未加载

评论 #14827609 未加载

wodenokotoalmost 8 years ago

Is there a way to read this decently on mobile?I've tried Firefox reading mode as well as pocket but they both cut off large parts of the text.

acbartalmost 8 years ago

In my introductory programming class, we teach a few basic forms of chart visualization. By far, students struggle the most with Histograms. Even more frustrating, they love line plots and attempt to use them everywhere. Despite my explanations that you can almost always use histograms, and you can almost never use line plots! Yet they go with what they find more intuitive...

评论 #14826599 未加载

agumonkeyalmost 8 years ago

Got me curious about non 1D histograms <a href="https://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/" rel="nofollow">https://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/</a>

ablabaalmost 8 years ago

The History of Histograms (vldb paper) <a href="http://www.vldb.org/conf/2003/papers/S02P01.pdf" rel="nofollow">http://www.vldb.org/conf/2003/papers/S02P01.pdf</a>

SeanLukealmost 8 years ago

Unfortunate that they're talking about distributions and yet the very first example they use ("The paintings of Bob Ross") isn't a distribution.

RodericDayalmost 8 years ago

> We notice that you're not using the Google Chrome browser. You're welcome to try continuing—but if some parts of the essay are rendering or behaving strangely, please try Chrome instead.what a world

评论 #14826826 未加载

10 comments

lukegoalmost 8 years ago

评论 #14826434 未加载

vanderZwanalmost 8 years ago

jtxx000almost 8 years ago

svaraalmost 8 years ago

评论 #14827823 未加载

评论 #14827609 未加载

wodenokotoalmost 8 years ago

Is there a way to read this decently on mobile?I've tried Firefox reading mode as well as pocket but they both cut off large parts of the text.

acbartalmost 8 years ago

评论 #14826599 未加载

agumonkeyalmost 8 years ago

Got me curious about non 1D histograms <a href="https://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/" rel="nofollow">https://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/</a>

ablabaalmost 8 years ago

The History of Histograms (vldb paper) <a href="http://www.vldb.org/conf/2003/papers/S02P01.pdf" rel="nofollow">http://www.vldb.org/conf/2003/papers/S02P01.pdf</a>

SeanLukealmost 8 years ago

Unfortunate that they're talking about distributions and yet the very first example they use ("The paintings of Bob Ross") isn't a distribution.

RodericDayalmost 8 years ago

评论 #14826826 未加载

What's so hard about histograms?

10 comments

What's so hard about histograms?

10 comments