I have doubts about the "We’re actually spending some time on every row" claim. Saving data using columnstore often comes with meta-data saved at the page level, such as min, max and count values for the data in that page. These values are used to filter and optimize the flow of data processed for a query (I mean 'skipped' here).
If you run a 'count' query 10 times, it's very unlikely the DB will count rows 10 times. It will rely on the page's existing meta-data when available (i.e., already computed). The tests described in the post are misleading IMHO.<p>EDIT: This comes on top of the fact that DBs can store queries results too. Moreover the post does not tell whether they have implemented clustered or filtered indexes on the considered columns. It does not explain how partition has been performed too. All this has a big impact on execution time.
So some questions::<p>1. Isn't (3)vectorization and (4)SIMD the same thing ?<p>2. I don't see the data-size before-after compression ?<p>3. How much RAM has each server ?<p>4. How do all cores work for all queries ? Is the data sharded by core on each machine or each core can work on whatever data ?<p>5. What's a comparison open-source tool to this ? Only I can think about is snappydata.
Meh. They used 448 cores to count the frequency of bit patterns of some small length in a probably more or less continuous block of memory. They had 57,756,221,440 total rows, that are 128,920,138 rows per core. If the data set contained 256 or less different stock symbols, then the task boils down to finding the byte histogram of a 123 MiB block of memory. My several years old laptop does this with the most straight forward C# implementation in 170 ms. That is less than a factor of 4 away from their 45.1 ms and given that AVX-512 can probably process 64 bytes at a time, we should have quite a bit room to spare for all the other steps involved in processing the query.<p>Don't get me wrong, in some sense it is really impressive that we reached that level of processing power and that this database engine can optimize that query down to counting bytes and generating highly performant code to do so, but as an indicator that this database can process trillions of rows per second it is just a publicity stunt. Sure, it can do it with this setup and this query, but don't be to surprised if you don't get anywhere near that with other queries.
And of course, how does it compare to kdb? So it seems less expensive, but also lacks the advanced query language.<p>The last tests I saw for kdb was the 1.1 billion taxi ride.<p><a href="http://tech.marksblogg.com/billion-nyc-taxi-kdb.html" rel="nofollow">http://tech.marksblogg.com/billion-nyc-taxi-kdb.html</a><p>Where it basically outperformed every other CPU based system with slightly more complex queries.<p>Any comparisons planned?
"When you deliver response time that drops down to about a quarter of a second, results seem to be instantaneous to users."<p>I don't think everybody agrees with this statement.
How fast is data import? Loading into RAM? (for example booting up a cluster for an existing imported database on AWS)<p>Working with some datasets with 100s of billions of short rows, curious to give it a try.
The speed of light is a hard limit. I don't believe there is any free lunch[1], but trade-offs to manage. I'm skeptical of any claim that implies free or easy speed without potentially significant trade-offs.<p>If you can live with somewhat out-of-date and/or out-of-sync data, you can throw mass parallelism at big read-only queries to get speed. The trade-offs often are best tuned from a domain perspective such that it's not really a technology problem, although technology may make certain tunings/tradeoffs easier to manage.<p>[1] (Faster hardware may give us incremental improvements, but the speed of light probably prevents any tradeoff-free breakthroughs.)