TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A/B testing pitfalls and lessons learned

15 点作者 samps超过 14 年前

2 条评论

btilly超过 14 年前
The advice is mixed in quality.<p>The worst piece of advice is to only use one metric, which is some complicated mix of other metrics. The basic reason they want this is to give a clear go/no go signal that everyone agrees on. Perhaps if you have to deal with the politics of a larger organization, that's a good idea. But if you're a small company, the extra detail you get about how your product is used from tracking multiple metrics is very good for helping clarify what you're trying to do, and how you want to do it.<p>Furthermore the act of creating a complex weighted measure is pushing the argument elsewhere. And when you're still trying to figure out how your site is actually performing, you don't have the context to know what measure to use. Furthermore you won't be able to use the obvious chi-square test (or its better relative, the g-test). There is no need to over-complicate the statistics.<p>The idea of having a hashing function to do test assignment is one that I had not considered. I've always suggested the obvious rand() at assignment time approach, which accomplishes the same thing but with more overhead at run time. I'd caution people who try the hashing approach to use a standard library, because it would be really, really easy to have the website think that assignment is done one way while your analysis assumes that it is done in another.<p>The minimum duration point is interesting...and somewhat useless. When I was preparing my presentation a few years ago I found out that, even if you know exactly how much better A is than B, you can't predict to within an order of magnitude how quickly your experiment will show it. My attitude is the much simpler, "The test takes however long it will take, and you can't really know how long that will be in advance." After you've done a few tests, people will have a good enough idea for a back of the envelope estimate.<p>The other advice seemed good, and mostly was obvious to me. But I have more experience with A/B testing than most do.
评论 #2160372 未加载
评论 #2160164 未加载
mwexler超过 14 年前
The MS site (<a href="http://exp-platform.com/" rel="nofollow">http://exp-platform.com/</a>) has more of Kohavi's papers on how MS uses experimentation across their systems. Worth a perusal if you want more depth than this overview document.