TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Statistical significance on a shoestring budget

71 pointsby AlexeyMKover 1 year ago

5 comments

grega5over 1 year ago
First, you really should move away from frequentist statistical testing and use Bayesian statistics instead. It is perfect for such occasions where you want to adjust your beliefs in what UX is best based on empirical data to support your decision. With collecting data you are increasing confidence in your decision rather than trying to meet an arbitrary criterion of a specific p-value.<p>Second, the “run-in-parallel” approach has a well defined name in experimental design, called a factorial design. The diagram shown is an example of full factorial design in which each level of each factor is combined with each level of all other factors. The advantage of such design is that interactions between factors can be tested as well. If there are good reasons to believe that there are no interactions between the different factors then you could use a partial factorial design that, which has the advantage of having less total combinations of levels while still enabling estimation of effects of individual factors.
评论 #37506734 未加载
评论 #37506680 未加载
jvansover 1 year ago
Building your own bayesian model with something like pymc3 is also a very reasonable approach to take with small data or data with too much variance to detect effects in a timely manner. This also forces you to think about the underlying distributions that generate your data which is an exercise in itself that can yield interesting insights.
评论 #37492253 未加载
评论 #37492478 未加载
评论 #37509520 未加载
charlierguoover 1 year ago
&gt; Gut Check: Especially if you’re off by quite a bit, this is a chance to take a step back and ask whether the company has reached growth scale or not. It could be that there are plenty of obvious 0-1 tactics left. Not everything has to be an experiment.<p>This is a key point, imo. I have a sneaking suspicion that a lot of companies are running &quot;growth teams&quot; that don&#x27;t have the scale where it actually makes sense to do so.
评论 #37508064 未加载
Fomiteover 1 year ago
There&#x27;s an argument to be made that, so long as your testing fully encompasses all visitors to your site, you aren&#x27;t sampling the population, you&#x27;re fully observing it, and statistical significance is irrelevant.
评论 #37508006 未加载
评论 #37508027 未加载
malfover 1 year ago
“Using modern experiment frameworks, all 3 of ideas can be safely tested at once, using parallel A&#x2F;B tests (see chart).”<p>Nooo! First, if one actually works, you’ve massively increased the “noise” for the other experiments, so your significance calculation is now off. Second, xkcd 882.
评论 #37493395 未加载