This simple idea has been helping me a lot in the past projects.<p>The performance gain from columnar storage is the compression ratio. And by ordering similar attributes together, it is going to greatly reducing the entropy between rows, which in turn leads to high compression and better performance.<p>The trick is to smartly select the which column you are going to have all the rows sorted upon.<p>I my previous company, we are using Redshift to encode 1 billions rows, and the simple change to let the table sorted by user_id reduce the whole table size by 50%, that is half a TB of disk storage, the improvement is nothing more but jaw-dropping. I think Google here just takes this trick into a more systematic method, which is really neat.<p>To point out, in columnar storage system, take ordering into account. Try some ordering that you feel could maximize the redundancy between rows, usually it is going to be primary id that is most representative of the underlying data. You don't need to have a fancy system like this one to leverage this power idea, it could apply to all columnar systems.