The advice is mixed in quality.<p>The worst piece of advice is to only use one metric, which is some complicated mix of other metrics. The basic reason they want this is to give a clear go/no go signal that everyone agrees on. Perhaps if you have to deal with the politics of a larger organization, that's a good idea. But if you're a small company, the extra detail you get about how your product is used from tracking multiple metrics is very good for helping clarify what you're trying to do, and how you want to do it.<p>Furthermore the act of creating a complex weighted measure is pushing the argument elsewhere. And when you're still trying to figure out how your site is actually performing, you don't have the context to know what measure to use. Furthermore you won't be able to use the obvious chi-square test (or its better relative, the g-test). There is no need to over-complicate the statistics.<p>The idea of having a hashing function to do test assignment is one that I had not considered. I've always suggested the obvious rand() at assignment time approach, which accomplishes the same thing but with more overhead at run time. I'd caution people who try the hashing approach to use a standard library, because it would be really, really easy to have the website think that assignment is done one way while your analysis assumes that it is done in another.<p>The minimum duration point is interesting...and somewhat useless. When I was preparing my presentation a few years ago I found out that, even if you know exactly how much better A is than B, you can't predict to within an order of magnitude how quickly your experiment will show it. My attitude is the much simpler, "The test takes however long it will take, and you can't really know how long that will be in advance." After you've done a few tests, people will have a good enough idea for a back of the envelope estimate.<p>The other advice seemed good, and mostly was obvious to me. But I have more experience with A/B testing than most do.