TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Conservation of Intent: why A/B tests aren’t as effective as they look

108 点作者 dedalus将近 7 年前

11 条评论

snovv_crash将近 7 年前
A&#x2F;B tests tell you about short term gains, but don&#x27;t tell you about long term issues you may be accumulating due to things like dark patterns, clickbait headlines, shoddy article topics and more. A&#x2F;B tests don&#x27;t take into account the loss of prestige or reputation that the options give.<p>I&#x27;ve seen this repeatedly with ArsTechnica, which has devolved into so much political and clickbait material that I don&#x27;t even really visit anymore. Yes, I&#x27;m guity myself of clicking on those articles when I do visit, but at a certain point I&#x27;ve found that Ars doesn&#x27;t have the news I&#x27;m after, so I turn elsewhere and now Ars has one less viewer.
评论 #17452463 未加载
评论 #17453097 未加载
评论 #17453229 未加载
评论 #17453576 未加载
评论 #17454462 未加载
评论 #17453564 未加载
评论 #17453491 未加载
评论 #17456305 未加载
评论 #17453869 未加载
birken将近 7 年前
I could not disagree with this more. I remember vividly having this &quot;low-intent&quot; vs &quot;high-intent&quot; debate at Thumbtack, when we rolled out changes that A&#x2F;B tests showed increased conversion (by a lot), but some people in the company thought the changes were ugly and &quot;off-brand&quot; and argued they brought in the wrong type of customers. So we ran the test again that we knew raised conversion by a lot, and then followed the 2 cohorts of customers and watched their behavior. The control group vs the 10% more from whatever the test was that increased conversion. They behaved exactly the same. They came back again at the same rates. They made the same amount of profit (per customer). Their response rates to emails were the same. They closed jobs at the same rates. As far as we could tell they were identical.<p>I have to admit I was a little surprised too, but for our business it didn&#x27;t seem this &quot;high-intent&quot; vs &quot;low-intent&quot; distinction existed. And with that out of the way we continued to optimize conversion rates, and our revenue continued to go up.<p>Every company is different so I don&#x27;t want to generalize too much, but if somebody tells me they ran an A&#x2F;B test that said some key flow went up 10%, but then afterwards the traffic&#x2F;revenue&#x2F;whatever didn&#x27;t go up 10%, I think the most likely candidate is bad test design. Humans are really good at rigging A&#x2F;B tests to produce wrong results in their favor. I guarantee every single company who isn&#x27;t maniacal about A&#x2F;B testing does at least one of the following:<p>- Uses a tool to grade A&#x2F;B tests that isn&#x27;t statistically sound<p>- Let&#x27;s people check tests too often and allows them to stop the test when it hits a good result<p>- Running a test with a lot of similar variations and cherry picking the best one<p>- Doesn&#x27;t plan for enough traffic to detect the percentage of change their test is likely to produce<p>All of these create the potential for the perceived gains of the A&#x2F;B test not matching up with real world result.<p>I&#x27;m not saying the distinction between &quot;low-intent&quot; and &quot;high-intent&quot; customers doesn&#x27;t exist, but it is fairly easy to test for. Do that test for your business and see if that distinction exists. But don&#x27;t use it as some magical explanation for why your A&#x2F;B tests aren&#x27;t producing the results you want as this article suggests.
评论 #17451545 未加载
评论 #17453074 未加载
评论 #17451967 未加载
评论 #17452421 未加载
评论 #17452736 未加载
评论 #17454244 未加载
gfodor将近 7 年前
The title is misleading relative to the article&#x27;s content. Surely, as the author points out, sometimes A&#x2F;B tests can be misleading especially if you ignore longer term cohort analysis, etc.<p>But often times, if you fix an obviously broken part of your funnel, particularly in the early acquisition stages, you&#x27;re fixing things that are universally lifting the amount of people who ultimately are able to engage with your brand and product to the point where they can even form intent. The reality is most people are only willing to give you a tiny bit of their time during their first one or two engagements with your brand, so at that stage you&#x27;re trying to sell them on your product, and build intent. A&#x2F;B testing helps reduce the friction needed to get them through the core of your sales pitch.<p>It&#x27;s easy to come up with a thought experiment that shows A&#x2F;B testing can sometimes be as simple as you&#x27;d imagine: just break the site. Your conversion drops to 0%, now split test the fix. Like magic, your control stays at 0% and your variant returns to normal. Nothing about &quot;intent&quot; in this scenario, this is pure friction resolution. Just a thought experiment, but shows that surely there are plenty of places where pure A&#x2F;B testing and removing friction is a net positive without any fretting over this &quot;conservation of intent&quot; issue.
foobaw将近 7 年前
Slightly relevant but useful: use meditation modeling (<a href="https:&#x2F;&#x2F;eng.uber.com&#x2F;mediation-modeling&#x2F;" rel="nofollow">https:&#x2F;&#x2F;eng.uber.com&#x2F;mediation-modeling&#x2F;</a>)
评论 #17452038 未加载
smueller1234将近 7 年前
What the article largely discusses seems to be a problem with the metrics one chooses as a proxy.<p>Let&#x27;s say you&#x27;re actually trying to optimize total transaction value on the site or total number of transactions or something like the overall fraction of users with at least one transaction within a certain window of time. Then - as the article rightly observes - getting users not to bounce on a particular page is a TERRIBLE proxy to what you&#x27;re optimizing for. If that&#x27;s not clear to you, you have no business running A&#x2F;B tests without supervision.<p>Source: co-designed one iteration of the experimentation framework for Booking.com many years ago. Indirectly managed the team of much more qualified people that took it a world further.
User23将近 7 年前
One of my coworkers is a trained particle physicist and he informs me he almost never sees properly designed experiments used by our A&#x2F;B testers. The result is that the testers almost always find what they are looking to find.
ben509将近 7 年前
I think the idea of &quot;high intent&quot; is the same fallacy as the notion of &quot;affordable.&quot; We say something is &quot;affordable&quot; because we have &quot;enough&quot; money to buy it, but that&#x27;s not how people make decisions in aggregate.<p>The reason economists talk about opportunity cost is because people are constantly optimizing decisions based on new information. (Humans may not deal with prices and numbers very well, but they&#x27;re pretty well evolved to break time into chunks and work out plans to solve problems.)<p>If you talk to an individual, they might say &quot;I can&#x27;t afford it,&quot; or you may talk to someone who didn&#x27;t click through and they might say, &quot;I was just browsing.&quot; The fallacy behind both is you&#x27;re creating archetypes and assuming they represent the modes of the population.<p>And even if you talk to the individuals you based those archetypes on, there is a whole history behind how they arrived at &quot;I can&#x27;t afford it.&quot; Those changing circumstances are why the aggregate behavior doesn&#x27;t show some arbitrary level of &quot;affordability,&quot; and instead you see a smooth curve of consumer demand.<p>And the opportunity cost of continuing to view a web page will not have neatly quantized levels of intent, but rather individuals have a broad array of competing interests.
MaxBarraclough将近 7 年前
&gt; You ship an experiment that’s +10% in your conversion funnel. Then your revenue&#x2F;installs&#x2F;whatever goes up by +10% right? Wrong :( Turns out usually it goes up a little bit, or maybe not at all.<p>Never mind &quot;The difference between high- and low-intent users&quot;, this could be explained in terms of regression-toward-the-mean, a phenomenon mention in neither the article, nor the discussion here.<p>Have 1000 students do an IQ test. Pick the top 20 students. Have them do another IQ test next week. Their mean score second time round will almost certainly be lower than their mean score first time round. The reason they made the top 20 the first time round was a combination of having a high true IQ, and being lucky on the day. Second time round, they aren&#x27;t &#x27;defined to be lucky&#x27;, as it were.<p>It&#x27;s the reason movie sequels tend to be worse than the original. The reason the sequel was made was that the original movie was far more successful than the average movie, on account of both unusually skillful creators, and unusually good luck. Second time round, you can&#x27;t count on the luck component again.
baybal2将近 7 年前
Totally true, I list count of people coming from the web&#x2F;startupey scene A&#x2F;B testing their companies&#x2F;business units into insolvency
a-dub将近 7 年前
Frustration is not linear. Film at 11.
jbob2000将近 7 年前
Wait, so you’re telling me the laziest form of scientific analysis, the A&#x2F;B test, doesn’t produce accurate results? Colour me shocked.<p>A&#x2F;B tests routinely leave out important observations, have way too small a scope, uncontrolled populations, I could go on... they run the gamut of anti-patterns.
评论 #17452835 未加载
评论 #17451682 未加载
评论 #17451675 未加载