They're faster - often by factors - than row based database if your workload is mainly based on aggregates. That's why they're so effective for datawarehousing.<p>On the other hand if your workload is update/delete heavy then they can be much slower than row-based databases.<p>Just imagine that you're updating a dozen rows at a single record. In a row database that's one search, in a column database it's 12 searches.<p>They're complimentary to each other - that's why you often see say banks using row-based databases for the day-to-day workload (lots of updates) and column databases for analytics (lots of data being aggregated).
There is a paper on a same topic:<p>Column-Stores vs. Row-Stores: How Different Are They
Really? [<a href="http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf" rel="nofollow">http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf</a>]<p>This is a very interesting paper where the researchers have compared performance of Column Store databases with plain Row Store databases as well as Row Store databases emulating column stores by using vertical partitioning and other techniques.<p>There is also A Comparison of C-Store and Row-Store in a Common Framework [<a href="http://pages.cs.wisc.edu/~alanh/tr.pdf" rel="nofollow">http://pages.cs.wisc.edu/~alanh/tr.pdf</a>] where researchers demonstrate that the gain that Column Store databases give can be easily achieve in row stores by but by other techniques.
For a project where I needed to quickly partition a data tale of millions of rows at multiple levels, I wrote a column database with bitmap index for data storage. It is 100% c# code.
The queries and data manipulation is lighting fast the data loading is a little bit slower than traditional db because all the index that needs to be created on the fly, but once is done, is a sweet ride after.
Things that I learned to consider, the quality of the compression algorithm, there is a trade off between compressing and some specifics type of queries.
Also, the cardinality of the column ( number of different values in the column ) will impact the indices that need to be created and the query time response, if the cardinality is too high, b-tree might be a better alternative.
I was working towards making a product out of it, but then I saw SQL Server new column index and I drop it. I don't think anybody would be interested in a little dll, when the big boy is offering the same functionality.
One question. Isn't this how B+ Tree index on a column work? A BT index is just the sorted values of a column with pointers to the row id of the actually rows. Note that BT index are often compressed as well since the sorted values have many similarity to its neighbors.<p>Also what's the difference between the bitmap part and the Bitmap Index (<a href="http://en.wikipedia.org/wiki/Bitmap_index" rel="nofollow">http://en.wikipedia.org/wiki/Bitmap_index</a>)?
Thanks, FastBit looks quite interesting, but the site is so... academic. So much great research is lurking behind "unassuming" web presences. 'Someone should do something!'<p><a href="https://sdm.lbl.gov/fastbit/" rel="nofollow">https://sdm.lbl.gov/fastbit/</a>
Column-oriented databases are faster because computers are built for arrays and not rows. It is a lot easier to pass through a simple column in memory or read a simple column from the disk. The pass through the memory benefits from all the caching functionality on the chip, and the pass through the disk benefits from the arm not having to bounce around. Essentially you move at the maximum speed allowed by the equipment. Row-based storage defeats all that.<p>Indexing is also improved with a column store. If you have a row-based storage mechanism, particularly one with variably sized entries, you're going to need a more complicated indexing scheme.
I have no direct experience with column oriented DBs, but I'd expect if you have a wide table and are getting all the columns in a query, then a column oriented DB's performance would suffer relative to a row oriented one's. So it's a trade-off, sometimes it's slower, sometimes it's faster, depends on the use-case.
Column-stores are a big part of the technology described in Google's Dremel paper (<a href="http://research.google.com/pubs/pub36632.html" rel="nofollow">http://research.google.com/pubs/pub36632.html</a>); they used a column-oriented format to build a system to support interactive queries to big datasets.
If you consider column oriented databases as relational database normalisation taken to the extreme, this makes sense.
Apart from the semantic benefits, Codd and Date have argued normalisation gives an efficiency benefit.
If you'd like to play with a SQL-standard column-store, check out the open source LucidDB. <a href="http://www.luciddb.org/html/main.html" rel="nofollow">http://www.luciddb.org/html/main.html</a>