TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Interpreting A/B test results: false positives and statistical significance

76 点作者 ciprian_craciun超过 3 年前

3 条评论

palae超过 3 年前
It&#x27;s probably a good idea to remind (or inform) people that at least in scientific research, null hypothesis statistical testing and &quot;statistical significance&quot; in particular have come under fire [1,2]. From the American Statistical Association (ASA) in 2019 [2]:<p>&quot;We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p &lt; 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way.<p>Regardless of whether it was ever useful, a declaration of “statistical significance” has today become meaningless.&quot;<p>[1] The ASA Statement on p-Values: Context, Process, and Purpose - <a href="https:&#x2F;&#x2F;www.tandfonline.com&#x2F;doi&#x2F;full&#x2F;10.1080&#x2F;00031305.2016.1154108" rel="nofollow">https:&#x2F;&#x2F;www.tandfonline.com&#x2F;doi&#x2F;full&#x2F;10.1080&#x2F;00031305.2016.1...</a><p>[2] Moving to a World Beyond “p &lt; 0.05” - <a href="https:&#x2F;&#x2F;www.tandfonline.com&#x2F;doi&#x2F;full&#x2F;10.1080&#x2F;00031305.2019.1583913" rel="nofollow">https:&#x2F;&#x2F;www.tandfonline.com&#x2F;doi&#x2F;full&#x2F;10.1080&#x2F;00031305.2019.1...</a>
评论 #29044334 未加载
评论 #29044036 未加载
评论 #29044313 未加载
dmitriid超过 3 年前
Before interpreting A&#x2F;B results, the main question that needs to be asked: &quot;what is it that you&#x27;re A&#x2F;B testing?&quot;<p>For too many companies, it&#x27;s testing &quot;engagement&quot; which leads to hiding functionality (more clicks is more engagement), reducing info density (more time spent is more engagement) etc.<p>And coming from Netflix... I don&#x27;t think there&#x27;s a single person who likes that when you browse Netflix it autoplays random videos (not even trailers) with audio at full volume. But yeah, A&#x2F;B tests something something. So I wish Netflix learned from their own teachings.
评论 #29043768 未加载
评论 #29043215 未加载
评论 #29043128 未加载
jonathanbentz超过 3 年前
I am interested to see what they will be testing in some of the upcoming posts in this series. It would be fun to be scrolling Netflix and have the transparency to know that I&#x27;m seeing the &#x27;B&#x27; test.
评论 #29047203 未加载
评论 #29043820 未加载