I think there's an issue with the histogram rendering in this post. The rapid descent from the spike on the left is not consistent with high ECDF impact and the apparent binning resolution visible in the piecewise line-segments. In general histograms should not be visualized with connected line-graphs in this way - the standard bar graph depiction makes the bin-width apparent and resolves some of the issues the article needs the ECDF for (e.g. relative impact can be assessed visually by comparing the relative areas of the associated bars). The bar visualization also makes it possible to use varying bin sizes, which is extremely useful with any distribution that has tails.
The ECDF is particularly useful to compare two distributions. And it has the nice connection to the Kolmogoroff-Smirnov test for testing if two distributions are different: It's test statistic is the maximum distance between the two ECDFs.
This is a nice article, but one this that's not quite right is that you can go from a histogram to an eCDF (basically view the bucketing as a loss in measurement precision).<p>I mention this because histograms, especially HDR histograms, are a very compact way of measuring distributions, and it's nice that you can keep those benefits and still convert to an eCDF.
I'm a behavioral scientist and I find both are useful. If you never look at a histogram it's surprisingly easy to fool yourself about what exactly the ecdf is telling you in certain situations, particularly when comparing distributions.
While this is nice, it seems like without bucketing you would run into complexity issues with large amounts of data, right? i.e. to plot a true eCDF you need a sorted list of all the collected datapoints. I guess for actual plotting you have to effectively bucketize based on the number of pixels in your plot, but that seems fairly arbitrary.<p>Histograms are nice in that they effectively compress non-trivial datasets (at least those that have a reasonable bounded domain) to something quite manageable.<p>I guess there is nothing stopping you from doing the same thing here, but it does kind of discount the author's claim of not being able to go between histogram and eCDF.<p>Am I missing something?
I recommend Kernel Density Estimation as an alternative to histograms if you are specifically interested in the density - e.g. which values are particularly likely to occur (perhaps for multimodal distributions).<p><a href="https://en.wikipedia.org/wiki/Kernel_density_estimation" rel="nofollow">https://en.wikipedia.org/wiki/Kernel_density_estimation</a>
for anyone finding themselves doing a bit of analysis using a eCDF, seaborn[0] has a plot for it<p><a href="https://seaborn.pydata.org/generated/seaborn.ecdfplot.html" rel="nofollow">https://seaborn.pydata.org/generated/seaborn.ecdfplot.html</a>