Why I Migrated Away From MongoDB

203 点作者 svs超过 12 年前

38 条评论

gregjor超过 12 年前

You were fortunate to recognize that MongoDB was the wrong tool for your job, and lucky to be able to move to Postgres instead of continuing to throw your time and effort away. I see the ad hominem "you're an ignorant idiot" attacks already started, along with advice like using regexes to do case-insensitive searches. Watching the NoSQL "movement" encounter the problems RDBMSs fixed 20 years ago and then hand-wave and kludge them away is frustrating. I wrote about some of this in <a href="http://typicalprogrammer.com/?p=14" rel="nofollow">http://typicalprogrammer.com/?p=14</a>.Look at the bright side: programmers who are writing NoSQL-backed apps are creating the fossil fuel that will keep programmers who know RDBMs working into our retirement years. I already have more work than I can do fixing web apps that were built around crap data management tools that failed to scale beyond a few thousand users. Your Postgres expertise will still be a money-making skill long after MongoDB is forgotten.

评论 #4533760 未加载

评论 #4534071 未加载

评论 #4534724 未加载

rbranson超过 12 年前

I'm no fan of MongoDB, but this same advice goes for any NoSQL data store. I am an Apache Cassandra contributor and community MVP, but my advice stays the same: it's best just to start with a SQL database and go from there. Read some books and learn it well: the "SQL Cookbook" from O'Reilly is great, and so is "The Art of SQL." Premature optimization continues to be the root of all evil.

评论 #4533695 未加载

评论 #4533413 未加载

评论 #4533973 未加载

评论 #4533908 未加载

评论 #4538292 未加载

bunderbunder超过 12 年前

Fourthly, and this one completely blew my mind - somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost. I thought we were scaling hard when one of our customers suddenly had 1111 documents overnight. Imagine my disappointment when I realised it was actually four 1s, added together. They’d become strings along the way.I've been having a similar problem with an SQLite data store, only the other way around. Strings were getting converted to numbers, with leading zeros that were significant and needed to be maintained being lost along the way.It sucked all the fun out of dynamic typing for me. At least in combination with automatic type conversions. Having to think about type and when to make transitions across type boundaries when you need to is just a little light busywork. Having to worry about type and transitions across type boundaries being made contrary to your intentions is a downright PITA and, it turns out, a serious quality control issue.

评论 #4533372 未加载

评论 #4533340 未加载

dccoolgai超过 12 年前

"To be honest, the decision to use MongoDb was an ill-thought out one. Lesson learned - thoroughly research any new technology you introduce into your stack, know well the strengths and weaknesses thereof and evaluate honestly whether it fits your needs or not - no matter how much hype there is surrounding said technology."I think you are not alone in learning this lesson with this particular technology. Fortunately it's one I learned by proxy from working adjacent to a team that decided to introduce Mongo into their stack...but I still wake up and hear the sceams at night of "You have to put the whole dataset on RAM?"...you weren't there, man...we lost a lot of good guys...You have to draw a clean line between "stuff it is really fun and enlightening to play with" and "stuff you introduce into your stack".

评论 #4533581 未加载

评论 #4533909 未加载

评论 #4533797 未加载

评论 #4536043 未加载

评论 #4536050 未加载

jamesli超过 12 年前

I am both a database guy and a software engineer. Being a software engineer, i kind of understand the hype behind NoSQL. Being a database guy spending years in studying how database engine works under the hood, many NoSQL implementations make me wonder how powerful marketing can be.In general, I love the ideas behind NoSQL. I can still feel the excitement when reading the BigTable and MapReduce papers. HBase, Hadoop, Radis, etc. are awesome products. I use some of them in my work. But some other NoSQL products? Being engineers, we must understand the implementation and be full aware of its limitations, instead of believing their marketing materials. Well, if all you want is to test a toy product, to build a prototype, or your product is of low concurrency and low data size and you have no concern on operation, it certainly looks that they make your development easier. But in these scenarios, any good relational databases won't add significant burden either.

评论 #4534677 未加载

jaimebuelta超过 12 年前

Mmm, not sure about some of the complains...- You can make case insensitive searches on the DB using regexes (<a href="http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions" rel="nofollow">http://www.mongodb.org/display/DOCS/Advanced+Queries#Advance...</a>). A simple case-insensitive regex is not very bad performance-wise, but in general, case-insensitive searches should be avoided for search purposes (you can normalize to set everything to lower case or other equivalent trick)- The proper way of doing an audit (and search later) is to make an independent collection with a reference to the other(s) document in a different collection. Then you can index by user, date, or any other field and leave the main collection alone. The described embedded access collection doesn't look very scalable.- Making map-reduce queries is tricky (at least for me). I think the guys on 10gen realizes that and the new aggregation framework is a way of dealing with this. Anyway, the main advantage of SQL is this kind of things, the rich query capabilities. Even if MongoDB allows some compared with other NoSQL DBs, if there is a lot of work in defining new queries, probably a SQL DB is the best fit, as that is where SQL excel.I don't truly believe in this "you should research everything before starting" (I mean, I believe in research, but too many times the "you should do your homework" argument is overused. Sometimes you make a decision based in some data that changes later, or is incomplete), as there are a lot of situations where you find problems as you go, not in the first steps. But, according to the description, looks like PostgreSQL is a better match and the transition hasn't been too painful, so we can classify this into "bump in the road"/"success history". Also, probably right now the DB schema is way more known that in the beginning, which simplifies the use of a relational DB.

评论 #4533877 未加载

daveman超过 12 年前

As an analytics professional who was pressured into a MongoDB environment, I feel the OP's pain. If you want to do gymnastics with your data, (aggregations of aggregations, joining result sets back onto data), SQL expressions are a 1000 times easier than Mongo constructs (e.g. map reduces). We usually ended up scraping out data from Mongo and dumping records into a SQL database before doing our transformations.All that said, our developers loved the ease of simple retrieval and insertion, and of course the scalability. So I guess you ultimately need to base your decisions on your priorities.I don't fault the OP though, since it's hard to know just how limiting NoSQL will be until you try to do all the things you used to assume were database tablestakes (no pun intended).

dkhenry超过 12 年前

Competly aside from the Article. The level of vitriolic discourse in this topic is astounding. I am amazed that as a community discussions of Database engines can draw out such mean spirited anger. I have never down voted as many comments on HN in a single thread then I have on this topic. I don't care which side of the debate you come down on. There is no excuse for belittleing and insulting others in a technical forum. Thats right I am looking at you<pre><code> gregjor, gaius, and zemo </code></pre> In this case it appears to mostly be those arguing for Postgre, but I wouldn't care if you were arguing for sunshine and unicorns there is a way to behave civilly and your not doing it.

评论 #4536516 未加载

评论 #4536356 未加载

jaequery超过 12 年前

I've ran into similar issue as you described. Something that can be done so simple and quickly in SQL, was bewilderingly difficult to do in mongo.The schema-less database approach also seems attractive at first but updating your data whenever your "app schema" changes starts to become a pain real quick.Now I can't really live w/o having a schema first, it actually saves you a lot more time in the long run (even short run), being schema-less means you can't really do anything too fancy w/ your data (generate reports, advanced search, etc...)

评论 #4533351 未加载

dkhenry超过 12 年前

Looks like he jumped ship just a bit too early.<a href="http://docs.mongodb.org/manual/applications/aggregation/" rel="nofollow">http://docs.mongodb.org/manual/applications/aggregation/</a>

评论 #4533142 未加载

dkarl超过 12 年前

You have to load every document in the database and extract the audit trail from it, then filter it in your app for the user you’re looking for. Just the thought of what that would do to my hardware was enough to turn me off the whole idea.Naive question from somebody who has done a little reading on and dabbling with key/document-with-MapReduce style datastores, but who hasn't tackled a real production problem: I thought running queries over the entire dataset was one of the assumptions of horizontally scalable document stores? In terms of avoiding computation, you can only limit queries by document key, which even if you're clever/lucky doesn't always encode the parameters you're querying on, or doesn't encode them in the right order, so you should be prepared to run queries over your entire dataset. Hopefully the queries you run often are optimized (e.g., using indexes or clever use of key ranges), but in the general case, you have to be prepared to scan the whole shebang, and that's supposed to be okay because of horizontal scalability, right?

评论 #4533435 未加载

adambard超过 12 年前

As a relative idiot when it comes to this sort of thing, I'd like to insert the following supplementary question: what is the sort of application/dataset for which Mongo is particularly suited?I've used it on small projects, and have enjoyed it. Perhaps my data has just been simple/loosely-coupled enough to never run into these problems?I read a lot of posts like this on HN before every trying Mongo, so I've at least been convinced to always implement schema at the application layer. Others seem to keep learning that lesson in harder ways.

评论 #4533600 未加载

评论 #4533575 未加载

评论 #4536891 未加载

redler超过 12 年前

digiDoc is all about converting paper documents like receipts and business cards into searchable database, and so a document database seemed like a logical fit(!).It looks like this single initial assumption is where things started going wrong: conflating the pieces of paper that happen to be called "documents" in the real world with the concept of a "document" in the context of a system like MongoDB.

评论 #4536187 未加载

tonynero超过 12 年前

The guy is getting such hate on the comments on his site, yet his opening line is that his choice was ill thought. Let him express his issues right?I choose MongoDB for my last side project and while it was awesome working schema-less and developing the client facing part of the project was certainly quicker to deliver, i feel pretty lost on the analytics/BI side of it and couldn't say it better than him: "Not having JOINs makes your data an intractable lump of mud"So coming from a relational/SQL background I found MongoDb awesome upfront, but frustrating later on... and yes I'm off to learn <a href="http://docs.mongodb.org/manual/applications/aggregation/" rel="nofollow">http://docs.mongodb.org/manual/applications/aggregation/</a>

stevencorona超过 12 年前

The downside, or challenge, with NoSQL (generally speaking) is that you need to handle your aggregations ahead of time - you need to know what queries you'll want to run in the future when you store your data. If you have some new aggregation you want to keep, you'll need to re-process the data (with Hadoop or something else).It's the trade-off of being able to scale reads and writes horizontally. And unless you need it, an RDBMS makes sense given the flexibility.Maybe, instead of looking at NoSQL as a full-on replacement for RDBMS, we can look at it as a better solution to sharding.

评论 #4533388 未加载

评论 #4536059 未加载

se85超过 12 年前

The guy just jumped on the bandwagon without having a clue.Just reading this blog - it's clear that MongoDB was not a good fit for him, if he had bothered to do some research, he would have found this out on day one.Thats the real lesson he should be taking away from this and blogging about yet somehow MongoDB are trolls and it's all their fault because of a lack of features and they have bypassed 40 years of computer science and blah blah blah blah, excuses, excuses, excuses.edit: removed a few pointless sentences :-)

leothekim超过 12 年前

"I can only come to the conclusion that mongodb is a well-funded and elaborate troll."It's possible the reasoning he used to use mongodb is the same as the one he used to abandon it.

chaostheory超过 12 年前

For me, what killed my enthusiasm for mongodb is the write locks. Yes they have been greatly improved in the 2.x release but it's still not good enough (for me).

programminggeek超过 12 年前

Look, there are some places where document DB's solve problems easier/better than SQL, other places kind of suck. For example, plain old object mapping is easier with a document DB. Relational DB's tend to make your code look/feel/act more relational and less object oriented. Your object model tends to look just like your table structure. This can be good or bad depending on your viewpoint.There are some approaches to solve some of the author's problems that end up making the Mongo system look and feel a lot more like a SQL system because sometimes data is actually related.The author could have also taken a different approach to his data schema that would have fit more of a non-relational worldview.Software development and architecture is about making choices and working with and around the limitations of your tools. It doesn't matter if PostgreSQL or MongoDB are "better". It's about solving a problem using a set of tools you are comfortable with.

mrinterweb超过 12 年前

I find this article to be more a reflection of a NoSQL newbie's failed foray with a document database that later realized that the grass is not as green as originally perceived. The developer realized that he does not like map-reduce and missed not having joins. I don't see how this person's failed experience with MongoDB is a reflection on MongoDB.I think the recent popularity of MongoDB bashing is maybe a testament to MongoDBs popularity. I'd guess that because MongoDB is probably the closest NoSQL database to a RDBMS with its ad hoc queries, that it is attracting many newcomers.

评论 #4535359 未加载

armored_mammal超过 12 年前

Can someone confirm that there is no such thing as a case insensitive index/search in Mongo? If true it seems likely that the author's comments have some degree of truth, at least when it comes to its usefulness for web and mobile applications. Storing data only lowercase isn't a good a idea for obvious reasons, and storing two copies of the same data for searching only, while not the end of the world, seems a little silly.

评论 #4533414 未加载

Teef超过 12 年前

There are 3 reasons I have gone running and screaming from and RDBMS. 1. Software gets large / complex to get meaning full work done. I am all about data consistency but at some point it is time to break things up into services and not have a single database. 2. If the software is popular enough everyone is running to use NoSQL (cache is NoSQL). 3. Clearly it is not a good storage solution either because for example in an address book nested list greatly simplifies everything. (right tool for the job)I spent many years hammering away with RDBMS and by and large it was great until it wasn't. I try to look at data storage more holistically now based on best guess of the problem. I have tried to convert an application from Postgresql to MongoDB and it failed but that wasn't MongoDB's falt it was because I didn't change the data model to fit a document storage system. I have also tried to use PostgreSQL for a realtime reporting system and failed horrifically and that was not Postgesql fault it was mine. Amazing what happens when you stop pushing a chain and start pull it!

评论 #4536003 未加载

firemanx超过 12 年前

I work for a company that operates in the energy industry. We utilize both RDBMS and "NoSQL", both have their purposes that they fit in well. We store customer account and configuration data in Postgres, and use Cassandra to store time-series statistics and high write volume data.I have a background in data warehousing in both Oracle and SQL Server, and was part of the decision to use a polyglot persistence model. I've got at least a decade's worth of experience in the DW world, and more as a general developer before that, so I like to think I've got a relatively credible background in a variety of data stores.I haven't looked at Mongo much - it's durability concerns and the write lock stuff pushed me away from it early on (I don't mean to disparage it, but that was where it was at when I evaluated it), but Cassandra's configurable consistency levels and operational story at a cluster level are what sold us for our time-series data (that, and the ability to construct a sparse timeline and multiplex reads/writes). For anything we need flexible querying with, we push it into specialized Postgres dbs.The level of willful ignorance and vitrol in this thread is kind of amazing. Most of the really experienced DW guys I know are all looking at HBase, Cassandra and others because they fit a niche that we've all been looking for in certain data sets at really large scale. It doesn't mean we're ditching our relational data stores, it just means we're augmenting them with other tools because they fit the job at hand. To suggest that one tool is absolutely perfect for every scenarios seems a little short-sighted to me, possibly driven out of inexperience. I don't mean that as an insult - I know a lot of guys who've been working on the same data sets for 30 years who really do just need the one tool - however, you've got to realize there are other data sets and problems for which your hammer just won't fit.

ilaksh超过 12 年前

On your home page you imply that you can automatically OCR arbitrary handwritten receipts into an analyzable format.No one can do that. That is your problem, not MongoDB.As far as aggregation, use the new Aggregation Framework <a href="http://docs.mongodb.org/manual/tutorial/aggregation-examples/" rel="nofollow">http://docs.mongodb.org/manual/tutorial/aggregation-examples...</a>:<pre><code> db.zipcodes.aggregate $group: _id: "$state" totalPop: $sum: "$pop" , $match: totalPop: $gte: 10 * 1000 * 1000 </code></pre> As far as "losing the independence of your data access paths", no you don't. You are free to use linking instead of embedding wherever you want. <a href="http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-EmbeddingandLinking" rel="nofollow">http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesi...</a>MongoDB doesn't have a built-in full text search? So what. Most systems with large amounts of text to search do not rely on the text search capabilities built into relational databases anyway. People use actual full-text search engines like Lucene/Solr, Sphinx, reds, etc. Having said that, if you just wanted to support lowercase keyword queries with MongoDB, would it really be so hard to extract and store lowercase keywords from your text, as suggested here? <a href="http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo" rel="nofollow">http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mong...</a>If you are trying to add four 1s and get '1111' instead of 4, that is an error in your application code which has nothing to do with MongoDB. Very common problem with JavaScript. If it is JavaScript, try finding the code where you are attempting numeric addition and change it so that instead of saying for example 'total += newNumber' it says 'total += (newNumber * 1)' .

评论 #4534219 未加载

评论 #4535041 未加载

voidr超过 12 年前

Relational databases are awesome if you are not dealing with huge amounts of data that your current hardware can't handle the relational way. There are some cases where you have a ridiculous amount of data(rows) and you simply can't store that in a relational database and you are happy to live without the benefits of relational databases.If you have millions of rows, you are probably better off with something like MongoDB, if you need to search that, you should probably use something like Sphinx or Lucene anyway. But if you know that you won't have too much data for the forceable future, you should use relational databases. OR you could simply use both.

评论 #4535258 未加载

评论 #4535983 未加载

评论 #4534808 未加载

nashequilibrium超过 12 年前

Taking all these comments into account, if startups have to prototype quickly while trying to find market fit, does it makes sense to start off using something like mongodb but with the plan to migrate to another database when you business starts growing? The database space is so confusing right now. It seems like Postgres is the safest choice and i also like this post fron Adam D'Angelo - '<a href="http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use-MySQL-as-the-data-store-instead-of-NoSQLs-such-as-Cassandra-MongoDB-or-CouchDB" rel="nofollow">http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use...</a>

bitdiffusion超过 12 年前

It's not necessarily all or nothing - I have worked on several projects now each using multiple database-type options: mongodb for read-intensive, loose-schema type stuff where the growth is generally predictable (e.g. products, suppliers, logs), postgres for relational-type stuff (orders) and solr for searching (I know solr isn't a database but people seem hung up on whether mongodb supports case-insensitive searching - hint: don't use any database for search).I doubt that, unless it's extremely simple, any set of requirements are an exact match to only one of these technologies... mix and match is the future :P

blaines超过 12 年前

Very briefly put, I use MongoDB to start off most new projects.My primary objective is that my application fulfills it's use case. Data is malleable, so you should use the right tools for your needs.That being said, it sounds like the OP was trying to use a chisel as a replacement for a toolbox. Basically fighting the software (mongodb) to fit his requirements, instead of using additional tools.<a href="http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html" rel="nofollow">http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html</a><a href="http://blog.heroku.com/archives/2010/7/20/nosql/" rel="nofollow">http://blog.heroku.com/archives/2010/7/20/nosql/</a>

ww520超过 12 年前

Wow, the first comment on the blog is so vile. He angrily blamed the "victim" (OP) as a talentless developer.Tools are enablers and supposed to make ordinary people rock star. If it takes a rock star to use a tool, the tool fails.

DonnyV超过 12 年前

If he just did 10min of research he would of realized MongoDB isn't for him. Also by doing that research he would've realized that MongoDB has no data constraints. Thats all done in your model in your application.

harel超过 12 年前

You used a tool without researching it first, you jumped a bandwagon without finding out its destination, you most likely used it wrong because you didn't RTFM. Now you ditch and diss it. Grow up.

评论 #4533216 未加载

bassemali超过 12 年前

I've never used Ruby on the application layer, but I'd be wary of using an ODM with MongoDB. The single most shocking issue seems to be at the application / ODM level. Using the official 10gen-supported drivers gives you more control and a better understanding of what's going on every step of the way.Also, a thorough understanding of MongoDB indexing, advanced queries and schema design would have squashed all of these issues. Has anybody had a more pleasant experience with a MongoDB ODM?

itaborai83超过 12 年前

To be fair, some NoSQL solutions were being sold marketing wise as the be-all and end-all of data solutions. Just google "mongodb mysql migration" and look how everyone is/was so eager to jump on the non-relational bandwagon. Some backlash was to be expected, after all, we might have reached the Trough of Disillusionment

anthony_barker超过 12 年前

Made the same mistake on a banking project 10 years ago (with domino). The project in question was a project tracking database.For accounting type problems use a relational database. For document driven items - e.g. a resume database - nosql works great. For a hybrid pick your battles... or use both.

manorasa超过 12 年前

I think the real lesson here is use the right tool for the right job.

effinjames超过 12 年前

calm down redditors, he just need a basic non majestic scale solution, SQL fitted him very well. large scale data aggregation needs to address disk and network latency, that's where NoSQL shines. and if you operate at large scale no 1 single simple tool will do justice, remember quote from google, 'at scale everything breaks'?

评论 #4535836 未加载

jeremyjh超过 12 年前

Upvoted for?

aliks超过 12 年前

Reasons why drop mongodb: 1. try $or $and with $near 2. No b-tree index :: count() 10.000 rows = 100% CPU usage ;) Type google.com then:: site:jira.mongodb.org/browse/ planned but not scheduled

38 条评论

gregjor超过 12 年前

评论 #4533760 未加载

评论 #4534071 未加载

评论 #4534724 未加载

rbranson超过 12 年前

评论 #4533695 未加载

评论 #4533413 未加载

评论 #4533973 未加载

评论 #4533908 未加载

评论 #4538292 未加载

bunderbunder超过 12 年前

评论 #4533372 未加载

评论 #4533340 未加载

dccoolgai超过 12 年前

评论 #4533581 未加载

评论 #4533909 未加载

评论 #4533797 未加载

评论 #4536043 未加载

评论 #4536050 未加载

jamesli超过 12 年前

评论 #4534677 未加载

jaimebuelta超过 12 年前

评论 #4533877 未加载

daveman超过 12 年前

dkhenry超过 12 年前

评论 #4536516 未加载

评论 #4536356 未加载

jaequery超过 12 年前

评论 #4533351 未加载

dkhenry超过 12 年前

Looks like he jumped ship just a bit too early.<a href="http://docs.mongodb.org/manual/applications/aggregation/" rel="nofollow">http://docs.mongodb.org/manual/applications/aggregation/</a>

评论 #4533142 未加载

dkarl超过 12 年前

评论 #4533435 未加载

adambard超过 12 年前

评论 #4533600 未加载

评论 #4533575 未加载

评论 #4536891 未加载

redler超过 12 年前

评论 #4536187 未加载

tonynero超过 12 年前

stevencorona超过 12 年前

评论 #4533388 未加载

评论 #4536059 未加载

se85超过 12 年前

leothekim超过 12 年前

"I can only come to the conclusion that mongodb is a well-funded and elaborate troll."It's possible the reasoning he used to use mongodb is the same as the one he used to abandon it.

chaostheory超过 12 年前

For me, what killed my enthusiasm for mongodb is the write locks. Yes they have been greatly improved in the 2.x release but it's still not good enough (for me).

programminggeek超过 12 年前

mrinterweb超过 12 年前

评论 #4535359 未加载

armored_mammal超过 12 年前

评论 #4533414 未加载

Teef超过 12 年前

评论 #4536003 未加载

firemanx超过 12 年前

ilaksh超过 12 年前

评论 #4534219 未加载

评论 #4535041 未加载

voidr超过 12 年前

评论 #4535258 未加载

评论 #4535983 未加载

评论 #4534808 未加载

nashequilibrium超过 12 年前

bitdiffusion超过 12 年前

blaines超过 12 年前

ww520超过 12 年前

DonnyV超过 12 年前

harel超过 12 年前

You used a tool without researching it first, you jumped a bandwagon without finding out its destination, you most likely used it wrong because you didn't RTFM. Now you ditch and diss it. Grow up.

评论 #4533216 未加载

bassemali超过 12 年前

itaborai83超过 12 年前

anthony_barker超过 12 年前

manorasa超过 12 年前

I think the real lesson here is use the right tool for the right job.

effinjames超过 12 年前

评论 #4535836 未加载

jeremyjh超过 12 年前

Upvoted for?

aliks超过 12 年前

Reasons why drop mongodb: 1. try $or $and with $near 2. No b-tree index :: count() 10.000 rows = 100% CPU usage ;) Type google.com then:: site:jira.mongodb.org/browse/ planned but not scheduled