TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Statistical significance on a shoestring budget

71 点作者 AlexeyMK超过 1 年前

5 条评论

grega5超过 1 年前
First, you really should move away from frequentist statistical testing and use Bayesian statistics instead. It is perfect for such occasions where you want to adjust your beliefs in what UX is best based on empirical data to support your decision. With collecting data you are increasing confidence in your decision rather than trying to meet an arbitrary criterion of a specific p-value.<p>Second, the “run-in-parallel” approach has a well defined name in experimental design, called a factorial design. The diagram shown is an example of full factorial design in which each level of each factor is combined with each level of all other factors. The advantage of such design is that interactions between factors can be tested as well. If there are good reasons to believe that there are no interactions between the different factors then you could use a partial factorial design that, which has the advantage of having less total combinations of levels while still enabling estimation of effects of individual factors.
评论 #37506734 未加载
评论 #37506680 未加载
jvans超过 1 年前
Building your own bayesian model with something like pymc3 is also a very reasonable approach to take with small data or data with too much variance to detect effects in a timely manner. This also forces you to think about the underlying distributions that generate your data which is an exercise in itself that can yield interesting insights.
评论 #37492253 未加载
评论 #37492478 未加载
评论 #37509520 未加载
charlierguo超过 1 年前
&gt; Gut Check: Especially if you’re off by quite a bit, this is a chance to take a step back and ask whether the company has reached growth scale or not. It could be that there are plenty of obvious 0-1 tactics left. Not everything has to be an experiment.<p>This is a key point, imo. I have a sneaking suspicion that a lot of companies are running &quot;growth teams&quot; that don&#x27;t have the scale where it actually makes sense to do so.
评论 #37508064 未加载
Fomite超过 1 年前
There&#x27;s an argument to be made that, so long as your testing fully encompasses all visitors to your site, you aren&#x27;t sampling the population, you&#x27;re fully observing it, and statistical significance is irrelevant.
评论 #37508006 未加载
评论 #37508027 未加载
malf超过 1 年前
“Using modern experiment frameworks, all 3 of ideas can be safely tested at once, using parallel A&#x2F;B tests (see chart).”<p>Nooo! First, if one actually works, you’ve massively increased the “noise” for the other experiments, so your significance calculation is now off. Second, xkcd 882.
评论 #37493395 未加载