TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Histogram vs. ECDF

69 点作者 r4um超过 2 年前

7 条评论

TTPrograms超过 2 年前
I think there's an issue with the histogram rendering in this post. The rapid descent from the spike on the left is not consistent with high ECDF impact and the apparent binning resolution visible in the piecewise line-segments. In general histograms should not be visualized with connected line-graphs in this way - the standard bar graph depiction makes the bin-width apparent and resolves some of the issues the article needs the ECDF for (e.g. relative impact can be assessed visually by comparing the relative areas of the associated bars). The bar visualization also makes it possible to use varying bin sizes, which is extremely useful with any distribution that has tails.
aquafox超过 2 年前
The ECDF is particularly useful to compare two distributions. And it has the nice connection to the Kolmogoroff-Smirnov test for testing if two distributions are different: It's test statistic is the maximum distance between the two ECDFs.
评论 #32715474 未加载
uluyol超过 2 年前
This is a nice article, but one this that&#x27;s not quite right is that you can go from a histogram to an eCDF (basically view the bucketing as a loss in measurement precision).<p>I mention this because histograms, especially HDR histograms, are a very compact way of measuring distributions, and it&#x27;s nice that you can keep those benefits and still convert to an eCDF.
ttpphd超过 2 年前
I&#x27;m a behavioral scientist and I find both are useful. If you never look at a histogram it&#x27;s surprisingly easy to fool yourself about what exactly the ecdf is telling you in certain situations, particularly when comparing distributions.
dafelst超过 2 年前
While this is nice, it seems like without bucketing you would run into complexity issues with large amounts of data, right? i.e. to plot a true eCDF you need a sorted list of all the collected datapoints. I guess for actual plotting you have to effectively bucketize based on the number of pixels in your plot, but that seems fairly arbitrary.<p>Histograms are nice in that they effectively compress non-trivial datasets (at least those that have a reasonable bounded domain) to something quite manageable.<p>I guess there is nothing stopping you from doing the same thing here, but it does kind of discount the author&#x27;s claim of not being able to go between histogram and eCDF.<p>Am I missing something?
评论 #32716390 未加载
评论 #32716314 未加载
mike-the-mikado超过 2 年前
I recommend Kernel Density Estimation as an alternative to histograms if you are specifically interested in the density - e.g. which values are particularly likely to occur (perhaps for multimodal distributions).<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Kernel_density_estimation" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Kernel_density_estimation</a>
评论 #32715572 未加载
chrsig超过 2 年前
for anyone finding themselves doing a bit of analysis using a eCDF, seaborn[0] has a plot for it<p><a href="https:&#x2F;&#x2F;seaborn.pydata.org&#x2F;generated&#x2F;seaborn.ecdfplot.html" rel="nofollow">https:&#x2F;&#x2F;seaborn.pydata.org&#x2F;generated&#x2F;seaborn.ecdfplot.html</a>