Another key fact is that "big data" is actually not that common, especially when it gets to the analysis stage.<p>The median job size at Microsoft and Yahoo is only 15GB. And 90% of Hadoop jobs at Facebook are under 100GB. Clearly you want to be able to crunch large log files, but in terms of day-to-day analysis the files are much smaller than that. (cite: <a href="http://research.microsoft.com/pubs/163083/hotcbp12%20final.pdf" rel="nofollow">http://research.microsoft.com/pubs/163083/hotcbp12%20final.p...</a>).<p>At Sense (<a href="http://www.senseplatform.com" rel="nofollow">http://www.senseplatform.com</a>) most of the clients we work with are struggling not with the size of their data but with tricky modeling problems that don't fit into standard black boxes and with integrating analytics into actual production systems. Adopting something like Hadoop for these tasks is not very productive.
From a data analyst's perspective, let's go through what he says.<p>First he states something along the lines of "More data does not always help." This is right from a theoretical perspective. But: it never hurts. This is also right from a theoretical perspective, it's a result from probability theory: additional observations will always lead to less or equal variance in your estimations. There is no data like more data. There is no down side with more data.<p>I am not sure in what way (2) and (3) relate to big data. I'd even say that (3) is pro big data.<p>Then there is this term "intelligent data". Actually, I can't emphasize how badly chosen this term is. Intelligence is related to the quality of actions someone takes. Data does not take actions, It just "is". Data cannot be intelligent, just as a stone cannot be intelligent.
He also thinks that data measurements should be repeatable. Guess what, in all interesting cases data measurements are <i>not</i> repeatable due to randomness in the source itself. One of the main challenges of data analysis is to still get robust results.
He also thinks that data should be concise, e.g. that the data set at hand should be as minimal as possible to lead to the same actions. This sounds like a chicken and egg problem. How would you be able to even assess this without trying it out?