I suspect a lot of companies use complicated tech for no reason. When using a database, even with the simpler case of not needing table joins at all, people shard databases and use caching on the front-end machines. This complicates their design, with a recent example being the blog post about Pinterest.<p>What does that offer that flat files on a shared filesystem don't?
- Concurrency (of access to records)<p>- Performance (e.g. caching, indexing, etc)<p>- Less first party code means less development/debugging/design time, with benefits growing very quickly the larger a project becomes<p>- Queries are more flexible than APIs (and add a clear isolation between data source-logic and data usage-logic)<p>- Scalability; flat files don't have it.<p>The only benefit of flat files is as a learning exercise. It is useful to understand how to make a database-like system yourself so you can understand how things can go wrong and certain edge cases.<p>But in production code I'm yet to see any situation where a homegrown flat file system was better in the long run than an off-the-shelf database solution. I have seen several flat file systems that have made it impossible for companies to scale and have likely cost them millions in lost revenue as a direct result, so that is fun too.
When you first start a project and have only one server a single file can be fine. Once you start getting too many people though you'll start dealing with file locks. One way of solving that is to map a file to each individual user (sharding). If you make each entry into the file immutable this will work beautifully for a while.<p>If you introduce the idea of groups of users you'll probably want to create a group text file for each user rather than store it as an entry in their pre-existing text file (kinda like an index though different)...<p>Main point being- I advocate starting with a text file until the complexity of managing that file begins to outstrip the complexity of managing a third party data store.<p>To many people go straight to a 3rd party data store by default when text files are extremely capable.
I worked at a company which sells document archiving to businesses, and they do not use a database. All data is stored on disk in files. They use indexing of archived data, similar to how google indexes everything (they don't use google). They also use indexed files to keep track of some data, but most of it is about users+permissions, not much to do with data.<p>I don't feel I can give further details, but the company is making money, nothing you'd read about, but it's doing OK. Solution scales up so that insurance and financial companies are their clients. Terabytes of data - no big deal, it's quite amazing to see how much faster it runs on customer's computers with terabytes of data, than in house with test-databases on developer's PC's with only 100 MB of test data. And it doesn't even require an admin (besides someone to make sure there's enough space on disk).<p>I think it really depends on your app. Databases aren't always the right answer, but no one has been fired for using databases. (maybe that's why people flock to them - that, plus if everyone's doing it, and you are worried about your resume ... well ... you better do it too)
Well, shared filesystems are kind of a mess. If they're some sort of clustered block filesystem, there's all kinds of weird performance edge cases and split-brain potential problems.<p>If you're using NFS or some sort of centralized filesever, you're going to bump into UNIX performance limitations around things like opendir() with lots of files in a single directory. Not to mention, trying to get any kind of atomicity around NFS is a huge pain - look at maildir as an example of a working model for a much simpler situation, email.<p>Sure, of course, in the early days you won't run into these issues, bceause you won't have enough objects. But once you start getting into a few hundred thousand files, you'll need to start hashing into subdirs, and deal with all the issues above depending on your tech.
I think the kind/amount of data and type of analysis that is to be performed on the data should probably lead the decision. Scenarios where you have hundred of GBs of data having to perform text analysis on them, probably would make me think in terms of flat files. But for simple web apps, probably RDBMS or a NoSQL store is the way to go.