Great write-up. A few observations:<p>1. The encoding and translation schemes of Postgres and mySQL/InnoDB are well described in the blog post, and I would also agree that InnoDB’s design is, all things considered, better for all the reasons outlined in the post.<p>2. I don’t understand why anyone still uses lseek() followed by read()/write() and not pread()/pwrite() syscalls. It’s trivial to replace the pair of calls with one. Aerospike is another datastore that resorts to pairs of seek/red-write instead of pread/pwrite calls.<p>3. Process/connection model makes no real sense nowadays - although to be fair, there is, today, practically almost no difference in terms of footprint between OS threads and OS processes (other than memory and FDs sharing semantics, they are practically the same). It’s still more appropriate to use threads (although I ‘d argue maintaining a pool of threads for processing requests and one/few threads for multiplexing network I/O is the better choice).<p>4. ALTER TABLE is obviously a pain point with mySQL, although I am not really sure many users with large datasets care; they probably figured out long ago it’s going to be an issue and they designed and expanded accordingly. It’s also a relatively rare operation. That said, other than using mySQL (or any other RDBMS) to build the data plane for an elaborate, distributed KV store, one should consider Salesforce’s approach too. Their tables have some 50 or so columns, and the column names are generic (e.g column_0, column_1, … ). They have a registry where they assign column indices (e.g column_0) to a specific high-level entity type (e.g customer title, or price), and whenever they need to query, they just translate from the high level entity to the actual column names and it works. They also, IIRC, use other tables to index those columns (e.g such an index table can have just 3 columns, table id, column index, value) and they consult that index when needed (FriendFeed did something similar).<p>5. Cassandra should have no problem supporting the operations and semantics of Shemaless ass described in their blog posts. However, given they already operate it in production, they probably considered it and decided against it.