TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Relating t-statistics and the relative width of confidence intervals

50 pointsby luuabout 1 year ago

3 comments

mjburgessabout 1 year ago
One important caveat to all these methods is that the central limit theorem must hold for the sample means and this is an <i>empirical</i> condition, not something you can know statistically.<p>Another important caveat: many things we want to measure are not well-distributed to allow the CLT to hold. If it doesnt, the bulk of statistical methods don&#x27;t work and the results are bunk.<p>Many quantities follow power-law distributions which would require trillions+ data points for the CLT to do its magic, ie., for the sample means of set A to be statistically-significantly different from set B would require 10^BIG if the property measured in A&#x2F;B is powerlaw distributed.<p>Now, even worse: many areas of &quot;science&quot; study phenomena which is almost certainly power-law distributed, and use these methods to do so.
评论 #39672220 未加载
评论 #39670814 未加载
评论 #39670689 未加载
FabHKabout 1 year ago
I was a bit confused by the article initially:<p>&gt; Perhaps most simply, with a t-statistic of 2, your 95% confidence intervals will nearly touch 0.<p>Your 95% CI <i>will</i> include 0, unless you have more than 50 or so data points, in which case there&#x27;s no point in using Student&#x27;s t-distribution, might as well use the Gaussian, which the author seems to assume, and which I thought gave rise to the z-score (in my mind, t-statistic = t-distribution, z-score = normal distribution).<p>But then looking things up, it turns out that difference is that the z-score is computed with population mean and sd, while the t-statistic is computed with sample mean and sd. So, yeah, practically you&#x27;ll use the t-statistic (and it will be t-distributed if the population is normally distributed), unless you already know population mean and sd, in which case you can compute the z-score (which will approach the normal distribution by CLT under certain conditions with large enough samples, but is otherwise not predicated on normality in any way).<p>Then all the author was pointing out is that if we take a +&#x2F;- 2 standard error CI, then if your statistic is 2, the CI goes from 0 to 4, giving rise to a 100% &quot;half-width&quot; of the CI, while if your statistic is 4, say, the CI goes from 2 to 6, giving rise to just 50% half-width.
评论 #39672312 未加载
nerdponxabout 1 year ago
Great little demo.<p>&gt; It is only when the statistical evidence against the null is overwhelming — “six sigma” overwhelming or more —that you’re also getting tight confidence intervals in relative terms. Among other things, this highlights that if you need to use your estimates quantitatively, rather than just to reject the null, default power analysis is going to be overoptimistic.<p>This I think will be a real head-scratcher for a lot of students, who are often taught to construct confidence intervals by no method apart from &quot;inverting&quot; an hypothesis test. It illustrates one of the many challenges (and dangers!) of teaching statistics.