27 pointsby vvanirudhabout 1 year ago

3 comments

bagrowabout 1 year ago

The best way to compute the empirical CDF (ECDF) is by sorting the data:<p><pre><code> N = len(data) X = sorted(data) Y = np.arange(N)/N plt.plot(X,Y) </code></pre> Technically, you should plot this with `plt.step`.

评论 #40006509 未加载

sobriquet9about 1 year ago

Why estimate PDF through histogram then convert to CDF, when one can estimate CDF directly? Doing so also avoids having to choose bin width that can have substantial impact.

评论 #40006488 未加载

Bostonianabout 1 year ago

If the data is continuous, use kernel density estimation (KDE) instead of histograms to visualize the probability density, since KDE will give a smoother fit. A similar idea is to fit a mixture of normals -- there are numerous R packages for this and sklearn.mixture.GaussianMixture in SciPy.

评论 #39974373 未加载

Histograms for Probability Density Estimation: A Primer

3 comments

Histograms for Probability Density Estimation: A Primer

3 comments