As usual, the next-big-thing-to-replace-the-old-crappy-thing has become nothing more than a useful addition to our toolbox.<p>Which is not a bad thing, but I wish we could do it without the overhyped nonsense. It really is time for this business to grow the fuck up.<p>The "web generation" has brought us lots of great changes, I would have quit IT back in the 90's if the internet hadn't happened. But this immature attitude towards everything from technology stacks to how to run a company is really starting to grate.
This is an incredibly short-sighted piece for two reasons<p>1) the data challenges of yesterday's internet giants are the data problems for just about every 21st century enterprise tomorrow. We're at the start of / in the midst of an irreversible data explosion.<p>2) Established players not taking new technologies and upstarts seriously is never evidence of the threat not being real. We know that on the contrary our industry is characterized by constant disruption where established players for whatever reason do not consistently stay at the front line of innovation and tend to get leapfrogged. (as an analogy Google+'s lurch into social may be akin to Oracle's lurch into nosql. Does that mean social was not real?)<p>Bottom line: It all depends on what you're doing. But more and more of us will be dealing with more and more complex data in the years to come. Of that, I'm certain.
I am wondering if the prolonged latencies from most of the Google services I am experiencing in the past few years are somehow related to using Spanner instead of the previously conventional "compute in advance offline"-"serve immediately" approach. The Spanner paper mentioned they sacrificed a bit of latency (from nanoseconds in DHT lookup to milliseconds). I understand it's more convenient for Google developers to have more predictable thought framework instead of making every single project a piece of art while spending most of the time fighting inconsistencies arising from eventual consistency. The question is if it was worth it? I remember time when Google services were amazingly fast - that time is unfortunately gone :-(
We switched from SQL to Mongo on <a href="http://versus.com" rel="nofollow">http://versus.com</a> one year ago and are quite happy—we couldn't imagine to go back.<p>But I think it heavily depends on the specific use case what DB to use.
An interesting article and it does seem like people rushed to embrace NoSQL and are now trying to force it into some kind of consistency after the event - not a bad thing, incidentally - there's a lot of interesting work here (Vector clocks, CRDTs, etc.).<p>One thing that surprised me though was the lack of a key player: Amazon. Their Dynamo paper was hugely significant and as a company they use eventually consistent stores for a whole swathe of products at scale.<p>Why mention Facebook and Google but omit this other major player, especially are their experiences tell a different story.
Relational databases don't scale horizontally. That's still as true today as it was a decade ago. Therefor engineers who need to cope with Big Data and Web Scale will continue to migrate away from relational database solutions towards persistence technology centered around distributed systems. It's as simple as that.<p>If you build your app around a relational database and you need to scale up big then at some point you're going to hit a brick wall in terms of scaling out storage and/or writes. You either have to build sharding logic into your relational db app from the beginning(which is a pain that NoSQL saves you from), or else you have to re-architect your entire app when the time comes that you need to deal with scale. Many shops end up borrowing VC money to build out a team to re-architect their systems to handle web scale, but this can be avoided by thinking about data access patterns from the beginning and choosing a technology that can handle your future needs.<p><a href="https://en.wikipedia.org/wiki/Scalability#Horizontal_and_vertical_scaling" rel="nofollow">https://en.wikipedia.org/wiki/Scalability#Horizontal_and_ver...</a>
I'm far from an expert on this topic, but I am a developer, and I am currently working on a project that leverages both MySQL and ElasticSearch for storage. MySQL handles all of the "boring" data such as users, profiles, comments, etc. ElasticSearch is basically a giant product database, no real relational data, just a normalized structure and easy to query. ElasticSearch is serving as a primary data store, there is no backing in a database because ES is just that good.<p>Whenever I see these "SQL vs NoSQL" arguments, I always have to wonder: Why one over the other? A lot of projects can benefit from both and there's no reason you absolutely HAVE to use one or another. It's perfectly reasonable (and probably ideal) to use more than one storage system in your projects.<p>If you have a bunch of nails to hammer in and bolts to tighten, you don't choose just a hammer or a wrench to do that job...you grab both and use each for what they do best.
NoSQL discussions always seem to conflate three very different things: storage engines, APIs and architecture. Where do we store the data ? How do we access it ? How do we make sure it scales ?<p>The "traditional" approach is to use Oracle/SQL Server/MySQL for storage, SQL and/or ORM as an API, and single-server tables-with-relationships as an architecture. Back in the early 2000s, everybody did this. Sure, there were a few performance-minded exceptions that went with sharding or master-slave architectures instead, but those were exceptions.<p>And single-server architectures tend to behave badly at medium loads. Spend the market rate for a genius DBA, and they still behave badly at high loads. The next step is a 32-core 128GB RAM monstrosity that costs an order of magnitude more than what eight 4-core 16GB servers would cost.<p>Most NoSQL solutions came with a new architecture. You had the MongoDB flavor of distributed storage, or the BigTable flavor of distributed storage, or the CouchDB flavor of distributed storage, and so on. Properly implemented distributed storage eats high loads for breakfast: just add more servers. This is a good thing.<p>My issue with the NoSQL movement is that they threw away the baby with the bath water. They threw away the single-server relational architecture, which was a nice change, and they also gave up the old battle-hardened storage engines and the highly expressive SQL language and replaced them with only-recently-experimental engines and ad hoc lean APIs.<p>It takes time for a storage engine to mature. To have all its performance kinks ironed out and all its bugs smoked out. I still remember the brouhaha around MongoDB persistence guarantees, or the critical data loss bugs in CouchDB.<p>And the lean APIs just forced back all the querying logic into the application, with all the filtering and the manual indexing and the joins and the approximate but ultimately incorrect implementations of whatever subset of ACID was required at the time. This wasn't an entirely bad thing: it certainly made many developers aware of the performance implications of some joins or transactions. But when you need to write a JOIN or GROUP BY or BEGIN TRANSACTION that you know will scale properly, and there's no API support for it ? Feh.<p>I'm a huge fan of the CouchDB architecture. Distributed change streams, with checkpointed views and cached reductions. But I have been burned by the CouchDB storage engine (can you say "data corÊ–NÑ %ñXtion" ?) and I see no point in bending knee to the laconic CouchDB API. So I took the CouchDB architecture and reimplemented it with a PostgreSQL back-end. It's _faster_ (don't underestimate the cost of those HTTP requests), I have trust that after PostgreSQL's decade-long history all threats to my data are long gone, and I can always whip out an SQL query when I do need it.<p>It's nice to see so many NoSQL solutions migrating back to an SQL-like API and gaining enough maturity to keep your data safe. In the near future, I expect them to be nothing more than "Architecture in a box" solutions for when you don't want to implement specific architectures in SQL. And I expect more and more "architecture plugins" to become available: with a library, turn a herd of SQL databases into a distributed architecture of type X.
How do people that abstract away their database via an ORM feel comfortable about not dealing directly with the data and accidentally dropping a db column via ORM? I know when I am dealing with the database I am a lot more surgical in my approach than when I am writing code.
>> NoSQL is nothing more than a storm in a teacup<p>There's been a difference with this "technology cycle" though, there have been some prominent, well grounded voices from the start of this cycle.<p>That's a good thing.<p>That wasn't the case for other cycles - the thin vs fat client cycle, the DAS vs NAS cycle etc. etc.<p>EDIT: i'm not saying NoSQL has no application. I earn a paycheck working with a huge graph db (and it's nothing to do with "social", yay!), and have previously been a heavy user of Cassandra.
We are stuck with NoSQL in HTML5, because Mozilla and Microsoft refuse to implement WebSQL (<a href="http://en.wikipedia.org/wiki/WebSQL" rel="nofollow">http://en.wikipedia.org/wiki/WebSQL</a> )<p>IndexedDB is fine for storing JSON objects, etc. but a relational database with SQL query syntax, indexes, etc. more powerful and means less code to write. With IndexedDB one has to reinvent the wheel to just get basic query features.<p>WebSQL is not deprecation, the W3C Working Group Note actually says:<p><pre><code> 'This specification is no longer in active maintenance
and the Web Applications Working Group does not intend to
maintain it further'.
</code></pre>
WebSQL is only available in Webkit based Browsers (Safari, Chrome) which means most mobile browsers.
As SQLite is in public domain, no company would "loose their face" if they choose to use it. They could fork off SQLite and change the SQL query syntax (parser) to whatever the W3C finds suitable. <a href="https://www.sqlite.org" rel="nofollow">https://www.sqlite.org</a><p>Mozilla Firefox and FirefoxOS both already ship SQLite for years and can be accessed by its internal JavaScript API. And several Microsoft products already use it anyway (e.g. Forza Xbox games). Microsoft has of course also various other SQL database libraries like MS Access JetRed, MS Outlook JetBlue and SQL Express.<p>We had a discussion about it recently: <a href="https://news.ycombinator.com/item?id=7574754" rel="nofollow">https://news.ycombinator.com/item?id=7574754</a><p>The new hip things is "NewSQL" (<a href="http://en.wikipedia.org/wiki/NewSQL" rel="nofollow">http://en.wikipedia.org/wiki/NewSQL</a> ). For example Facebook, Google Ads, etc. are powered by MySQL's InnoDB database engine. I would go as far as count SQLite to this group.<p>We would need a movement to convince Mozilla to finally add WebSQL to Firefox and FirefoxOS.
If you're doing accounting, sure you would want ACID compliant database. There you have a limited amount of kinds of data to store with strict and rarely changing constraints. You can keep it on one big expensive server (plus backup) and it's better to bring the system down to allow inconsistency.<p>However, for most web development SQL is seriously not good. You end up with hundreds of loosely coupled tables which constantly change their structure for new features. Half dozen of joins on every request. Hundreds of lines of SQL. Constant pain. And it's not like you cared so much for the consistency - if once a year three comments disappear from your web site, so what?<p>And it's painful to make SQL multi-master.<p>For (the most of) web development document databases are so much better. MongoDB is pretty nice because it makes hundreds of lines of SQL with ten files of code redundant - all per one complex document.