Whom the Gods Would Destroy, They First Give Real-Time Analytics (2013)

291 pointsby irfansharifover 7 years ago

20 comments

nerfhammerover 7 years ago

> The turnaround time also imposes a welcome pressure on experimental design. People are more likely to think carefully about how their controls work and how they set up their measurements when there's no promise of immediate feedback.This seems like a cranky rationalization of the lack of a fairly ordinary system.Sure, you shouldn't draw conclusions for potentially small effects on < 24 hours of data. But of course if you've done any real world AB testing, much less any statistics training, you should already know that.What this means is you can't tell whether an experiment launch has gone badly wrong. Small effect size experiments are one thing, but you can surely tell if you've badly broken something in short order.Contrary to encouraging people to be careful, it can make people risk averse for fear of breaking something. And it slows down the process of running experiments a lot. Every time you want to launch something, you probably have to launch it at a very small % of traffic, then you have to wait a full 24-36 hours to know whether you've broken anything, then increase the experiment size. Versus some semi-realtime system: launch, wait 30 minutes, did we break anything? No? OK, let's crank up the group sizes... Without semi-realtime, you have to basically add two full days times 1 + the probability of doing something wrong and requiring relaunch (compounding of course) to the development time of everything you want to try. Plus, if you have the confidence that you haven't broken anything you can much larger experiment sizes so you get significant results much faster.

评论 #15382331 未加载

评论 #15383691 未加载

评论 #15382609 未加载

评论 #15381871 未加载

jakobeggerover 7 years ago

When I started my business, I looked at sales number every day.I got my hopes up when I sold 15 copies of my app on a good day, only to feel completely devasted when I sold only 4 copies the next day.After some time I stopped looking at daily sales numbers, and switched to weekly numbers instead. But even that was too often.Now I fetch my numbers roughly every other week, and don‘t worry at all about an individual numbers. Only by looking at trends over a longer timeframe can you make sensible decisions... The numbers just fluctuate too much from one week to the next.

评论 #15380671 未加载

评论 #15380244 未加载

评论 #15381613 未加载

ACow_Adonisover 7 years ago

"But unless the intention is to make decisions with this data, one might wonder what the purpose of such a system could possibly be."Speaking as an analyst, you'd be amazed (or not) at how much analytics is done either for resume padding, because one guy thinks it's "cool", or to enable marketing or executives to chase their own tails.Indeed, with the general phenomenon of bullshit jobs, infinite instantaneous and always changing information is a welcome smokescreen, because it always allows people to "justify doing something". With such numbers, there is always something for them to do :p

评论 #15383206 未加载

评论 #15380507 未加载

评论 #15380561 未加载

tzuryover 7 years ago

The last sentence worth reading the entire article!<pre><code> Real-time web analytics is a seductive concept. It appeals to our desire for instant gratification. But the truth is that there are very few product decisions that can be made in real time, if there are any at all. Analysis is difficult enough already, without attempting to do it at speed.</code></pre>

ChuckMcMover 7 years ago

This is so true for so many situations. One of the hardest thing to understand on the 'other' side of a browser is the dimensionality of the stuff you are measuring. I recall an A/B test we did at Blekko that simply uncovered the presence of a 'click bot' that was always clicking on like the 15th link on a page.

sskatesover 7 years ago

Firstly, I'm super biased as I'm the CEO of a product analytics company where one of the value props is getting the data in real time. I agree with this post that people will look at single data points out of context and weight the evidence much more strongly than they should be. Analytics should be one of many tools you use that informs your understanding of how customers are using your product. I also agree that I see a lot of early stage startups invest way too much in building out real-time analytics stores without thinking about what they value they get out of it. That's not something you should do until you're in the 100+ engineer range.That said, this is a post-hoc justification of why having real-time analytics is bad. Delaying the data by 24 hours doesn't automatically make significance testing better or force people to incorporate more context when interpreting their data. If that's the issue, fix that problem, don't blame the tools.There's a ton of positive value to having real time data. Just off the top of my head: 1) If you've instrumented something incorrectly you can see that and fix it right away 2) Even worse, if you've accidentally messed up a feature with a release you can know about it right away. This happened to one of our customers recently and without a real-time analytics system they wouldn't have caught it quickly (and this is a tech startup that's well regarded for their engineering that everyone here would know the name of). 3) You can observe significant changes to your user base right away, eg during a launch or if you're getting a lot of new users from a specific channel. 4) It allows you to have more confidence in deploying multiple times a day. It's a little crazy to me that the deployment of a product is faster than our ability to measure it.I think the real issue is that we're in the early days of analytics and people don't have a great understanding of how to leverage their tools properly. It's like web search before Google or file sync before Dropbox. People will look at single data points and conclude totally crazy things. They won't have an understanding of basic things like the amount of fluctuation on a week to week basis. The largest analytics provider in the world (Google) gives away their product as an ancillary service to drive you to purchase more ads, not to have a better understanding of how your customers are using your product. I'm hopeful this will change over the next 5 years though (and for us to be a part of that!)

评论 #15380661 未加载

评论 #15382895 未加载

评论 #15382062 未加载

habosaover 7 years ago

The conclusions of the article are probably correct, but it's important for analytics systems to have real time capabilities for debugging and iteration. When you're wiring up some new events or funnels you need to be able to click through and sanity check that you're collecting the data you expect.Maybe some of the desire for whole systems to be real time comes from this frustration.

评论 #15380588 未加载

paulsutterover 7 years ago

Real-time data gives that nice warm confirmation that the change you just deployed is working as expected. Code change, site failover, any change at all. It drives me batty that Google analytics update interval is unpredictable, so I look at real-time numbers.Certainly it’s foolish to jump to conclusions from a too-small sample. Increasing latency does nothing to solve this. I’d recommend an introductory stats course instead.

jwatteover 7 years ago

Some real time analytics are useful. For example, pricing of virtual currency (how deep to run an online "sale") can be much more profitable of you have a real time measurement of price elasticity. I've built tools like that for people who actually need them, and it works in those cases.However if you don't do real time business, you don't need real time analytics. What would you do with it? Have new wireframes generated and implemented in real time every five minutes? Switch database engines five times a day?I feel the article makes the second argument well, but perhaps hasn't seen the case where the first argument holds. Try telling a currency trader that they don't need real time analytics and see how far that takes you!

dangover 7 years ago

Discussed at the time: <a href="https://news.ycombinator.com/item?id=5032588" rel="nofollow">https://news.ycombinator.com/item?id=5032588</a>.

alexandercrohdeover 7 years ago

Generally really good points. However one great use of realtime analytics is as a way to immediately catch production errors (particularly when connected to alarms, e.g. Splunk).My thought is that realtime amazing systems as discussed can be an exceedingly difficult engineering problem or a trivial business problem (use google or another hosted solution).Personally, any engineer who would try to reinvent the wheel on this one (rather than leverage any of the many incredibly refined existing technologies) should not serve in a decision-making capacity.

评论 #15381163 未加载

pizzaover 7 years ago

Relevant: The empirical distribution is ... not empirical (5:58) <a href="https://www.youtube.com/watch?v=0iw9oqnhVKQ" rel="nofollow">https://www.youtube.com/watch?v=0iw9oqnhVKQ</a>

jasonkesterover 7 years ago

Indeed, "real-time" is one of the most requested features for S3stat, even though there's really not much you'd be able to do with faster data. In our case, Amazon doesn't even deliver their logfiles until 8 hours or so after the fact, so "real-time" reports would just be a pretty moving picture of the past.I toy around with the idea of building it out as a feature anyway, just so that I can charge a premium to customers who want to turn it on.

评论 #15382743 未加载

PaulHouleover 7 years ago

Early stopping is just fine if you are Bayesian. Not fine if you are doing null-hypothesis significance testing.Multi-armed bandits beat the heck out of A/B testing because they dynamically balance making money now with taking risks that might mean you make more money later.<a href="https://en.wikipedia.org/wiki/Multi-armed_bandit" rel="nofollow">https://en.wikipedia.org/wiki/Multi-armed_bandit</a>I'd think they should be more popular among quantitative marketing types than they are.

dreamfactoredover 7 years ago

Real-time data is crucial to commercial publishing, and ads, and financial services, and costing serverless, and supply chain, and probably a whole bunch of other domains

qq66over 7 years ago

In my experience, 99% of the value of real-time analytics has been in identifying service disruptions that monitoring tools don't find, and 1% of the value has been in informing business decisions. If some change badly broke your HTML rendering, but didn't throw any errors, you may not see it in service monitoring but you will definitely see it in the signups per day.

PeachPlumover 7 years ago

> Accuracy (how precise the data is). Everything should be accurate.Accurate and precise are different things.You can be precise: We averaged 1 sale per hour today +- 0.01You can be accurate: We sold 8 things todayYou can be precise but innaccurate: We averaged 100 sales per hour today +- 0.01You can be imprecise and accurate: We sold 8 items today +- 10And you can be both: We sold 8 items today +- 0.01And you can be neither: We sold 100 items today +- 10

评论 #15382845 未加载

pedasmithover 7 years ago

"No sampling" is just another way of saying "the queries will be so slow that you will never do any ad-hoc queries. It's also short for "I don't understand statistics".

评论 #15381037 未加载

j7akeover 7 years ago

Are there theoretical arguments to why random sampling is bad and one should instead use the entire dataset ?

评论 #15382769 未加载

评论 #15382278 未加载

javiramosover 7 years ago

Hate to point this out but it is funny how the Manifesto author equates accuracy to precision: "Accuracy (how precise the data is). Everything should be accurate."I deal with the subtelties of precision/accuracy everyday.