Whom the Gods Would Destroy, They First Give Real-time Analytics

186 pointsby chrisdinnover 12 years ago

13 comments

lmkgover 12 years ago

Full-time web analyst here. Total agreement.Information is as useful as your ability to act on it--no more, no less. Real-time analytics is something that sounds sexy and gets a lot of headlines (and probably sales), but it's not particularly useful, especially compared to the cost to implement. Most organizations aren't capable of executing quick decisions of any significance. In fact, quite a few business models wouldn't have much to gain even if they were capable of it.My experience is that there are three types of companies, with very little overlap:1. Companies large enough to receive statistically significant amounts of data in under an hour.2. Companies small enough to make decisions regarding significant site updates in under an hour.3. Companies whose name is "Google."Fact of the matter is, any change to your site more significant than changing a hex value will require time overhead to think up, spec out, test, and apply. Except in the most pathological cases of cowboy coding, it will take at least a day for minor changes. Changing, say, the page flow of your registration process will take a week to a month. You won't be re-allocating your multi-million-dollar media budget more often than once a quarter, and you have to plan it several months in advance anyways because you need to sign purchase orders.In short, you can usually wait 'til tomorrow to get your data. Really, you can. Sure, you can probably stop an A/B test at the drop of a hat, but if it took you a week to build it, you ought to let it run longer than that.I have had one client who really did benefit from real-time-ish (same-day) data. It was a large celebrity news site. They could use data from what stories were popular in the morning to decide which drivel to shovel out that afternoon. This exception nonetheless proves the rule: Of the 6 "requirements" listed in the article, only 1.5 were needed in this particular case: hard yes on accessibility, and timeliness was relaxed from 5 minutes to 30.(Note that when I say analytics, I mean tools for making business decisions. Ops teams have use for real-time data collection, but the data they need is altogether different, and they are better served by specialized tools).

评论 #5033233 未加载

评论 #5034108 未加载

btillyover 12 years ago

Gah, yet another article that links to Evan Miller's article on how to not run an A/B test. I really need to finish writing my article that explains why it is wrong, and how you can do better without such artificial restrictions.His math is right, but the logic misses a basic fact. In A/B testing nobody cares if you draw a conclusion when there is really no difference, because that is a bad decision that costs no money. What people properly should care about is drawing the wrong conclusion when there is a real difference. But if there is a significant difference, only for small samples sizes is there a realistic chance of drawing a wrong conclusion, and after that the only question is whether the bias has been strong enough to make the statistical conclusion right.He also is using 95% confidence as a cut-off. Don't do that. You don't need much more data to massively increase the confidence level, and so if the cost of collecting it is not prohibitive you absolutely should go ahead and do that. Particularly if you're tracking multiple statistics. If you test regularly those 5% chances of error add up fast.

评论 #5033238 未加载

评论 #5033153 未加载

评论 #5033561 未加载

评论 #5032982 未加载

sardonicbryanover 12 years ago

So I built and use a realtime analytics dashboard that tracks revenue, projected revenue, revenue by hour for a portfolio of social games. I find it incredibly useful, but I will give a couple tips that address some of the issues in the article:1) You have to provide context for everything. Current real time revenue is presented right next to the 14 day average revenue up to that point in time, and also how many standard deviations the delta between the two is. Ie: Current revenue is $100 at 10am, vs. 14 day average of $90, which is 0.2 standard deviations of revenue at that time.2) Hourly revenue is presented the same way, right next to the 14 day average revenue for that hour and the SD delta.3) Look at it a lot. I've been looking at this sheet regularly for over a year now, and I have a really good feel/instinct for what a normal revenue swing is, and an even better feel for the impact of different features/content/events/promotions on our revenue.4) This approach also works better when the impact of your releases is high. A big release typically spikes revenue 2-3 SD above baseline, and causes an immediate and highly visible effect. So while I'm not strictly testing for statistical significance, it's one of those things where it's pretty obvious.5) It also works better if you use it in conjunction with other metrics. We validate insights/intuitions gained from looking at realtime data against weekly cohorted metrics for the last several months of cohorts.

评论 #5033411 未加载

physcabover 12 years ago

I like this rant. Seldom do I see the need for a real-time system and sometimes I think engineers and program managers gravitate towards the concept to better answer questions of "why" a problem happens. But analytics problems most of the time can't be solved in real time. You have to put on your thinking cap, take a step back, do some background research, and be patient. And as an analyst it is bad for your credibility to jump to conclusions. Unlike engineering, it better to be slow and right on your first try than "move fast and break things".

ChuckMcMover 12 years ago

Nice post. Ops guys though, like to see the bushes rustling right away so that we can reboot that switch before all hell breaks loose :-)The central theme is a good one though, tactics or strategies have an innate timeline associated with them, and deciding on tactics or strategies with data that doesn't have a similar timeline leads to poor decisions. The coin flip example in the article is a great one.Ideally one could say "What is the shortest interval of coin flips I can measure to 'accurately' determine a fair coin?" And realize that accuracy asymptotically approaches 100%. One of the things that separate experienced people from inexperienced ones are having lived through a number of these 'collect-analyze-decide' cycles and getting a feel for how much data is enough.

评论 #5032988 未加载

评论 #5032883 未加载

creatureover 12 years ago

I once interviewed for a lead webdev role at a small startup. They had 10-12 people, and a product that was doing OK. (I was thoroughly unconvinced by it, but that's another story). One of the things they talked about was their upcoming plan to build a real-time analytics system to track user behaviour. A big project! That I would get to spearhead! They'd budgeted 2-3 months and 6-8 people to implement it. We talked about their plans for a bit, before I asked (what I thought was) the obvious question:"So, what's the real-time system going to help you decide that the current system won't?"There is a long, uncomfortable pause as the two people look at each other, each hoping the other will answer."Well... it's not so much the real-time element, per se..." one managed. "But we want more granular data about how people are using our app.""Okay. But you're currently doing analytics via HTTP callbacks, right? Why not just extend that to hit some new endpoints for your more granular data? You've already got infrastructure in place on the front and back end to support that."No answer. We moved on. I don't know if I actually saved them 1-2 man-years of work or if they plowed ahead anyway.

lostnetover 12 years ago

And we shouldn't have calculators because we may forget the relationships between numbers?I use analytics to do significant A/A testing on every configuration the sites users are actually using to determine what will work for my A/B testing later... Should I maintain a separate realtime analytics or delay deployments by 24 hours when I would like a little more assurance? This is not a rhetorical question, whether I should keep maintaining separate tracking for the 20% of the time where google analytics is unfit is an open problem for me.Similarly, I would like to know if there is a sudden plummet in some demographic the second I start a test. It usually isn't significant, but the client panic will be. It is better to cancel the test and do a post-mortem before restarting.. A B test doesn't have to get its day in court.Giving delayed numbers for routine reports is perfectly valid, dressing up that pig is luddism.

评论 #5033682 未加载

josh2600over 12 years ago

This is a really interesting post.While I agree with the basic premise that Real-time analytics are rarely helpful, here are a couple places where they could be very useful:* Conferences - Being able to see live user analytics on a conference site, since it is ephemeral, would be great.* Pop-up Sites - Again, the short nature of the site means seeing a blocking action or a broken link early is tremendously valuable.Basically there are a couple circumstances where real-time analytics might make sense, but they're generally short duration engagements. Getting analytics info for a site which is no longer being hammered is useless unless it's a long term project.

评论 #5036323 未加载

car54whereareuover 12 years ago

"You just need to understand cause and effect," said Apollo."He's right, mortal. This isn't what you would call rocket science," added Athena."Okay, and my business will succeed if I can understand cause and effect?""Yes," said Apollo."Of course! Why are you wasting time? Go write some software", said Athena.So yeah, real-time A/B testing seems like a bad idea, but real-time analytics sounds fine. On the other hand, maybe the Gods gave you the idea of cause and effect to destroy you. I bet more than one story on hacker news today pretends to understand the causes for an effect.

AnthonyMouseover 12 years ago

I agree with this in general, but there are exceptions. For example, it would be nice to know immediately if a new change has caused your conversion rate to drop precipitously for some reason, so that you can turn it back off and take a minute to see if you can figure out why before you lose a full day's worth of revenue.

评论 #5039999 未加载

cftmover 12 years ago

Interesting post though I feel the author is somewhat missing the forest for the trees; the issue isn't about "real-time" the issue is that many people conducting A/B tests don't understand what the statistics are telling them nor do they understand when an adequate "sample" has been pulled.Real-time data isn't needed for A/B testing but this falls into the PEBKAC category.

phyalowover 12 years ago

Splunk? - I cant help but think that is piece of software would address most concerns this article raises.

frozenportover 12 years ago

Yes, Yes, and a Thousand Times, Yes!