A few things here:<p>1) What/where exactly are they using GlusterFS for? Has Gluster fixed their scaling problems yet? Specifically the issue where new storage spaces/nodes were only available to new directories and files, but not existing directories? Granted, the last time I looked at this was 2009 or so, but it was a flaw due to their "no master node" topology.<p>2) FB has an entire team to manage Hadoop/HBase. This shows just how much of a beast that stack is. Anyone who has run Hadoop on "Internet time" knows what I'm talking about. It's great at running time insensitive, deferred compute jobs in an academic or scientific setting. It's really hard to keep it all 100% running in an on-demand setting. Aside, I couldn't imagine just working on 1 product in an operations setting as my full-time job. Boredom/fatigue must be a problem on that team.<p>3) I'd like to see more information on the networking side. What transport protocol? How large are the average updates in frame size? Etc etc.<p>We've built something similar to Gorilla in-house, so I'm happy to see that we've come to some of the same conclusions.
I really wish this included a comparison with KDB. It's not cheap to get a license, and they certainly wouldn't give a testing license in order to publish benchmarks against it, but in finance it is the standard for TSDBs. There hasn't ever been anything open source that has come close.
Why pointers, why not just do a mirror mmap if you have constant offsets and if time points change and querying based on time points need be constant maybe a table that holds an offset w/ the difference? Also why not atomics instead of spinning?
> Further, many data sources only store integers into ODS<p>If the underlying data type is 64 bit double, aren't they losing precision for integers greater than 2^53?
So this isn't managing news feed data or anything like that, it's helping them aggregate server performance and error data for quick look up?