With due respect for a well written article, my first impression is that the suggested approach misses the point. If a single machine will be able to handle the load of the combined queries, then yes, you should replicate instead of shard. But in this case, why shard at all? And what about the cases that actually require sharding, where your rate of queries and rate of updates don't allow you to run it all on a single machine?<p>It's possible my view is skewed. I've been looking at this problem from the point of view of distributed full text search, where I don't see any possibility of centralization in the manner you suggest. Still, I find the solution of trying to handle this in the database API to be suspect. If you know that some information will never need to be joined, why not have two databases instead of splitting the tables?