TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why are column oriented databases so much faster than row oriented databases?

139 pointsby siganakisover 13 years ago

12 comments

radicalbyteover 13 years ago
They're faster - often by factors - than row based database if your workload is mainly based on aggregates. That's why they're so effective for datawarehousing.<p>On the other hand if your workload is update/delete heavy then they can be much slower than row-based databases.<p>Just imagine that you're updating a dozen rows at a single record. In a row database that's one search, in a column database it's 12 searches.<p>They're complimentary to each other - that's why you often see say banks using row-based databases for the day-to-day workload (lots of updates) and column databases for analytics (lots of data being aggregated).
mayanksinghalover 13 years ago
There is a paper on a same topic:<p>Column-Stores vs. Row-Stores: How Different Are They Really? [<a href="http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf" rel="nofollow">http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf</a>]<p>This is a very interesting paper where the researchers have compared performance of Column Store databases with plain Row Store databases as well as Row Store databases emulating column stores by using vertical partitioning and other techniques.<p>There is also A Comparison of C-Store and Row-Store in a Common Framework [<a href="http://pages.cs.wisc.edu/~alanh/tr.pdf" rel="nofollow">http://pages.cs.wisc.edu/~alanh/tr.pdf</a>] where researchers demonstrate that the gain that Column Store databases give can be easily achieve in row stores by but by other techniques.
评论 #3524619 未加载
评论 #3524508 未加载
评论 #3525374 未加载
jorgeleoover 13 years ago
For a project where I needed to quickly partition a data tale of millions of rows at multiple levels, I wrote a column database with bitmap index for data storage. It is 100% c# code. The queries and data manipulation is lighting fast the data loading is a little bit slower than traditional db because all the index that needs to be created on the fly, but once is done, is a sweet ride after. Things that I learned to consider, the quality of the compression algorithm, there is a trade off between compressing and some specifics type of queries. Also, the cardinality of the column ( number of different values in the column ) will impact the indices that need to be created and the query time response, if the cardinality is too high, b-tree might be a better alternative. I was working towards making a product out of it, but then I saw SQL Server new column index and I drop it. I don't think anybody would be interested in a little dll, when the big boy is offering the same functionality.
评论 #3526310 未加载
评论 #3527743 未加载
ww520over 13 years ago
One question. Isn't this how B+ Tree index on a column work? A BT index is just the sorted values of a column with pointers to the row id of the actually rows. Note that BT index are often compressed as well since the sorted values have many similarity to its neighbors.<p>Also what's the difference between the bitmap part and the Bitmap Index (<a href="http://en.wikipedia.org/wiki/Bitmap_index" rel="nofollow">http://en.wikipedia.org/wiki/Bitmap_index</a>)?
评论 #3525100 未加载
jmahover 13 years ago
Thanks, FastBit looks quite interesting, but the site is so... academic. So much great research is lurking behind "unassuming" web presences. 'Someone should do something!'<p><a href="https://sdm.lbl.gov/fastbit/" rel="nofollow">https://sdm.lbl.gov/fastbit/</a>
评论 #3524584 未加载
kclover 13 years ago
Column-oriented databases are faster because computers are built for arrays and not rows. It is a lot easier to pass through a simple column in memory or read a simple column from the disk. The pass through the memory benefits from all the caching functionality on the chip, and the pass through the disk benefits from the arm not having to bounce around. Essentially you move at the maximum speed allowed by the equipment. Row-based storage defeats all that.<p>Indexing is also improved with a column store. If you have a row-based storage mechanism, particularly one with variably sized entries, you're going to need a more complicated indexing scheme.
评论 #3524726 未加载
评论 #3525111 未加载
Maroover 13 years ago
I have no direct experience with column oriented DBs, but I'd expect if you have a wide table and are getting all the columns in a query, then a column oriented DB's performance would suffer relative to a row oriented one's. So it's a trade-off, sometimes it's slower, sometimes it's faster, depends on the use-case.
评论 #3525104 未加载
ruggeriover 13 years ago
Column-stores are a big part of the technology described in Google's Dremel paper (<a href="http://research.google.com/pubs/pub36632.html" rel="nofollow">http://research.google.com/pubs/pub36632.html</a>); they used a column-oriented format to build a system to support interactive queries to big datasets.
emmelaichover 13 years ago
If you consider column oriented databases as relational database normalisation taken to the extreme, this makes sense. Apart from the semantic benefits, Codd and Date have argued normalisation gives an efficiency benefit.
评论 #3526734 未加载
评论 #3525114 未加载
Jachover 13 years ago
If you'd like to play with a SQL-standard column-store, check out the open source LucidDB. <a href="http://www.luciddb.org/html/main.html" rel="nofollow">http://www.luciddb.org/html/main.html</a>
tantalorover 13 years ago
See also, <a href="http://en.wikipedia.org/wiki/Postings_file" rel="nofollow">http://en.wikipedia.org/wiki/Postings_file</a>
ThaddeusQuay2over 13 years ago
<a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS" rel="nofollow">http://en.wikipedia.org/wiki/Column-oriented_DBMS</a>