I may be in the minority, but I rather like the rigidity of a fixed schema. Yes, it's a bit more "painful" for initial setup (<i>you have to actually create the tables</i>) and yes, it's a bit more work during migrations (<i>you have to actually add the columns</i>), but I don't see either of those as getting in the way enough to give it up. It's just too useful.<p>Data structures are meant to last much longer than application code. Anyone that has worked on a long running system can attest to that; well defined data structures and table layouts will outlive any application code.<p>When I'm designing a system, I think I spend orders of magnitude more time thinking about data structures then actually implementing the CREATE/ALTER TABLE code for them. Planned properly, you can even do the ALTERs/CREATEs necessary to add columns in advance of any actual app usage (<i>ie. "two stage" app deployment</i>).<p>There is a place for "flex fields" or storing generic "documents" but legit use cases are pretty rare. When they are necessary, a single JSON column is usually enough. The example I generally use is an audit trail: The who (<i>FK to user</i>), what (<i>event enum</i>), and when (<i>timestamp</i>) are all strongly typed but you may want a JSON field for event specific data.<p>Oh and if anybody has every tried to do a data migration with a schema-less database ... well have fun with that. Either you bite the bullet and convert everything or you end up with a lot if/then/else logic littered through your app that will bite you down the road.
Currently using MongoDB on a project and boy do I just hate it. We have highly relational data that is a nightmare to reason with and query. Guess what? We also have migrations! It's hard to explain why, but I just get a bad feeling in my tummy every time I have to interact with the database. Just not enjoyable to work with at all. The only thing I've seen it do that actually impresses me is the aggregation framework/pipeline. That's actually pretty neat, and I feel like it's a good fit for the data we're querying with it (analytics data). Other than that, man, I wish I was using PostgreSQL...
> There’s no reason higher level frameworks or ORMs couldn't handle the migration process.<p>> ...this is a manual painful process today, but theres no reason this can’t be fully handled by PostgreSQL or directly within an ORM .<p>You mean like EF already does [1]?<p>> Add-Migration will scaffold the next migration based on changes you have made to your model since the last migration was created<p>For what it's worth, I don't use that feature. I prefer to use SQL Server Data Tools [2] to maintain a model of the database and use its schema and data diff tools to generate upgrade scripts. This is more due to the database pre-dating EF migrations but as well the schema is fairly complex so having SSDT (with its knowledge of nearly all SQL Server object types) do diffs against the actual database model is better than EF diffing its own abstract model.<p>[1] <a href="http://msdn.microsoft.com/en-us/data/jj591621.aspx" rel="nofollow">http://msdn.microsoft.com/en-us/data/jj591621.aspx</a><p>[2] <a href="http://msdn.microsoft.com/en-us/library/hh272686(v=vs.103).aspx" rel="nofollow">http://msdn.microsoft.com/en-us/library/hh272686(v=vs.103).a...</a>
I find the whole "schema-less" thing to be a side-effect of NoSQL marketed as a feature. Unless you are about prototyping a small feature quickly (which MongoDB is great for) - on the large scale of things, 5,6 months out in development time, you will find that the time saved on "not having to do migrations" may very will be nil. In most cases, if you want to keep your hair you end up having a schema anyways. At the very best MongoDB and friends make it easy add columns, but removing and changing datatypes might as well be migrations. Also, as the article mentions, Postgres now has a JSON data type making the schema less advantage kind of useless.<p>When it comes to limits of relational databases some of the real limits are actually sharding, replication and high availability, which are all relatively more difficult to do on the popular YesSQL databases.<p>In any case, its pretty tiring to see NoSQL only refer to MongoDB and other document stores. Redis, Cassandra, HBase and Hive are all NoSQL engines ranked before the next document store, and IMO have a lot harsher performance penalties for incorrect usage than Postgres & friends. Given that Cassandra is a the second highest ranked NoSQL store which also (sort-of) enforces a schema, implying that the limits on relational databases are schemas is pretty bizarre.<p><a href="http://db-engines.com/en/ranking" rel="nofollow">http://db-engines.com/en/ranking</a>
Craig - great post. I think you missed a few huge benefits that drive the "NoSQL movement." Specifically, sharding and aggregation.<p>Most devs don't need the power of sharding, which is why that benefit can never be felt. But the reality is this is probably the #1 characteristics (huge benefit) of NoSQL databases. Google and Amazon definitely paved the way for this movement, primarily b/c they dealt with tons of data. Its simply cheaper to scale out (distributed) than scale up.<p>You can't aggregate data with a relational database. But you can aggregate with (most) NoSQL databases (exception is graph dbs). Instead of building relationships, with NoSQL you're building composites. The huge benefit here is enabling sharding while, having your data all in one place.<p>Lastly, a relationship db is made up of tuples and sets of tuples. With a NoSQL DB you can have complex data structures. I think this is the point you we're trying to make re: Documents.<p>I still love relational databases. It's cool to have options though. Before, relational was the only way.<p>The question is how do we determine which db to use (or use more than one)? Ah, the beauty of polygot persistance...
You would not want a Framework/ORM that abstracts alter statements from you when working with large tables in a relational db. Alter statements can have real performance interruption and is something that an engineer has to be aware of. It's a case where a framework might help 70% of the use cases but really hurt the other 30% with medium-larger sized tables.
> Having Rails/Django/(Framework of your choice) automatically notice the need for a column to exist and make appropriate modifications you could work with it the same way you would managing a document relation in your code.<p>Hibernate can do this (IIRC, it's been awhile), which is great for development mode, but I would never trust a framework/ORM to "auto-migrate" my production database.
I am not an expert on RDBMSs, but isn't the issue of distributed computing and the CAP theorem a far more important driving force behind the NoSQL movement than any of the reasons in the article?