Launch HN: QuestDB (YC S20) – Fast open source time series database

357 pointsby bluestreakalmost 5 years ago

Hey everyone, I’m Vlad and I co-founded QuestDB (<a href="https://questdb.io" rel="nofollow">https://questdb.io</a>) with Nic and Tanc. QuestDB is an open source database for time series, events, and analytical workloads with a primary focus on performance (<a href="https://github.com/questdb/questdb" rel="nofollow">https://github.com/questdb/questdb</a>).It started in 2012 when an energy trading company hired me to rebuild their real-time vessel tracking system. Management wanted me to use a well-known XML database that they had just bought a license for. This option would have required to take down production for about a week just to ingest the data. And a week downtime was not an option. With no more money to spend on software, I turned to alternatives such as OpenTSDB but they were not a fit for our data model. There was no solution in sight to deliver the project.Then, I stumbled upon Peter Lawrey’s Java Chronicle library [1]. It loaded the same data in 2 minutes instead of a week using memory-mapped files. Besides the performance aspect, I found it fascinating that such a simple method was solving multiple issues simultaneously: fast write, read can happen even before data is committed to disk, code interacts with memory rather than IO functions, no buffers to copy. Incidentally, this was my first exposure to zero-GC Java.But there were several issues. First, at the time It didn’t look like the library was going to be maintained. Second, it used Java NIO instead of using the OS API directly. This adds overhead since it creates individual objects with sole purpose to hold a memory address for each memory page. Third, although the NIO allocation API was well documented, the release API was not. It was really easy to run out of memory and hard to manage memory page release. I decided to ditch the XML DB and then started to write a custom storage engine in Java, similar to what Java Chronicle did. This engine used memory mapped files, off-heap memory and a custom query system for geospatial time series. Implementing this was a refreshing experience. I learned more in a few weeks than in years on the job.Throughout my career, I mostly worked at large companies where developers are “managed” via itemized tasks sent as tickets. There was no room for creativity or initiative. In fact, it was in one’s best interest to follow the ticket's exact instructions, even if it was complete nonsense. I had just been promoted to a managerial role and regretted it after a week. After so much time hoping for a promotion, I immediately wanted to go back to the technical side. I became obsessed with learning new stuff again, particularly in the high performance space.With some money aside, I left my job and started to work on QuestDB solo. I used Java and a small C layer to interact directly with the OS API without passing through a selector API. Although existing OS API wrappers would have been easier to get started with, the overhead increases complexity and hurts performance. I also wanted the system to be completely GC-free. To do this, I had to build off-heap memory management myself and I could not use off-the-shelf libraries. I had to rewrite many of the standard ones over the years to avoid producing any garbage.As I had my first kid, I had to take contracting gigs to make ends meet over the following 6 years. All the stuff I had been learning boosted my confidence and I started performing well at interviews. This allowed me to get better paying contracts, I could take fewer jobs and free up more time to work on QuestDB while looking after my family. I would do research during the day and implement this into QuestDB at night. I was constantly looking for the next thing, which would take performance closer to the limits of the hardware.A year in, I realised that my initial design was actually flawed and that it had to be thrown away. It had no concept of separation between readers and writers and would thus allow dirty reads. Storage was not guaranteed to be contiguous, and pages could be of various non-64-bit-divisible sizes. It was also very much cache-unfriendly, forcing the use of slow row-based reads instead of fast columnar and vectorized ones.Commits were slow, and as individual column files could be committed independently, they left the data open to corruption.Although this was a setback, I got back to work. I wrote the new engine to allow atomic and durable multi-column commits, provide repeatable read isolation, and for commits to be instantaneous. To do this, I separated transaction files from the data files. This made it possible to commit multiple columns simultaneously as a simple update of the last committed row id. I also made storage dense by removing overlapping memory pages and writing data byte by byte over page edges.This new approach improved query performance. It made it easy to split data across worker threads and to optimise the CPU pipeline with prefetch. It unlocked column-based execution and additional virtual parallelism with SIMD instruction sets [2] thanks to Agner Fog’s Vector Class Library [3]. It made it possible to implement more recent innovations like our own version of Google SwissTable [4]. I published more details when we released a demo server a few weeks ago on ShowHN [5]. This demo is still available to try online with a pre-loaded dataset of 1.6 billion rows [6]. Although it was hard and discouraging at first, this rewrite turned out to be the second best thing that happened to QuestDB.The best thing was that people started to contribute to the project. I am really humbled that Tanc and Nic left our previous employer to build QuestDB. A few months later, former colleagues of mine left their stable low-latency jobs at banks to join us. I take this as a huge responsibility and I don’t want to let these guys down. The amount of work ahead gives me headaches and goosebumps at the same time.QuestDB is deployed in production, including into a large fintech company. We’ve been focusing on building a community to get our first users and gather as much feedback as possible.Thank you for reading this story - I hope it was interesting. I would love to read your feedback on QuestDB and to answer questions.[1] <a href="https://github.com/peter-lawrey/Java-Chronicle" rel="nofollow">https://github.com/peter-lawrey/Java-Chronicle</a>[2] <a href="https://news.ycombinator.com/item?id=22803504" rel="nofollow">https://news.ycombinator.com/item?id=22803504</a>[3] <a href="https://www.agner.org/optimize/vectorclass.pdf" rel="nofollow">https://www.agner.org/optimize/vectorclass.pdf</a>[4] <a href="https://github.com/questdb/questdb/blob/master/core/src/main/c/share/rosti.h" rel="nofollow">https://github.com/questdb/questdb/blob/master/core/src/main...</a>[5] <a href="https://news.ycombinator.com/item?id=23616878" rel="nofollow">https://news.ycombinator.com/item?id=23616878</a>[6] <a href="http://try.questdb.io:9000/" rel="nofollow">http://try.questdb.io:9000/</a>

45 comments

viialmost 5 years ago

mmap'd databases are really quick to implement. I implemented both row and column orientated databases. The traders and quants loved it - and adoption took off after we built a web interface that let you see a whole day and also zoom into exact trades with 100ms load times for even the most heavily traded symbols.The benefits of mmaping and in general POSIX filesystem atomic properties are quick implementation, where you don't have to worry about buffer management. The filesystem and disk block remapping layer (in SSD or even HDDs now) are radically more efficient when data are given to them in contiguous large chunks. This is difficult to control with mmap where the OS may write out pages at its whim. However, even using advanced Linux system calls like mremap and fallocate, which try to improve the complexity of changing mappings and layout in the filesystem, eventually this lack of control over buffers will bite you.And then when you look at it, the kernel (with help from the processor TLB) has to maintain complex data-structures to represent the mappings and their dirty/clean states. Accessing memory is not O(1) even when it is in RAM. Making something better tuned to a database than the kernel page management is a significant hurdle but that's where there are opportunities.

评论 #23976752 未加载

shay_keralmost 5 years ago

Absolutely love the story. TimescaleDB & InfluxDB have had a lot of posts on HN, so I'm sure others are wondering - how do we compare QuestDB to them? It sounds like performance is a big one, but I'm curious to hear your take on it.

评论 #23976487 未加载

评论 #23981281 未加载

pachicoalmost 5 years ago

I see this as a very interesting project. I use ClickHouse as OLAP and I'm very happy with it. I can tell you features that make me stick to it. If some day QuestDB offers them, I might explore the possibility to switch but never before. - very fast (I guess we're aligned here) - real time materialized views for aggregation functions (this is absolutely a killer feature that makes it quite pointless to be fast if you don't have it) - data warehouse features: I can join different data sources in one query. This allows me to join, for instance, my MySQL/MariaDB domain dB with it and produce very complete reports. - Grafana plugin - very easy to share/scale at table level - huge set of functions, from geo to URL, from ML to string manipulation - dictionaries: I can load maxdb geo dB and do real time localisation in queries I might add some more once they come to my mind. Having said this, good job!!!

评论 #23979059 未加载

评论 #23986665 未加载

thegreatpeteralmost 5 years ago

Am I the only one that's like "wtf is a time-series database compared to a normal one?"

评论 #23976315 未加载

评论 #23981564 未加载

评论 #23976358 未加载

评论 #23976299 未加载

评论 #23985515 未加载

评论 #23976310 未加载

评论 #23976482 未加载

评论 #23976306 未加载

hintymadalmost 5 years ago

I'm curious how QuestDB handles dimensions. OLAP support with reasonably large number of dimensions and cardinality in the range of at least thousands is a must for modern-day time series database. Otherwise, what we get is only incremental improvement to Graphite -- a darling among startups, I understand, but a non-scalable extremely hard to use timeseries database nonetheless.A common flaw I see in many time-series DBs is that they store one time series per combination of dimensions. As a result, any aggregation will result in scanning of potentially millions of time series. If any time-series DB claims that it is backed up by a key-value store, say, Cassandra, then the DB will have the aforementioned issue. For instance, Uber's M3 used to be backed up by Cassandra, and therefore would give this mysterious warning that an aggregation function exceeded the quota of 10,000 time series, even though from user's point of view the function dealt with a single time series with a number of dimensions.

评论 #23980015 未加载

评论 #23979372 未加载

numlock86almost 5 years ago

> <a href="https://news.ycombinator.com/item?id=22803504" rel="nofollow">https://news.ycombinator.com/item?id=22803504</a>As I already said or rather asked there: Assume I already use Clickhouse for example. What are the benefits of QuestDB? Why should I use it instead?Surely it's a good tech and competition is key. But what are the key points that should make me look into it? There is a lot of story about the making and such, but I don't see the "selling point".

评论 #23985760 未加载

maz1balmost 5 years ago

Hi Vlad, this looks really interesting!I really enjoyed reading the backstory and the founding dynamics upon QuestDB was born and I think a lot of others in the YC community will as well.Can you give some use cases or specific examples of why QuestDB is unique?

评论 #23976264 未加载

didipalmost 5 years ago

I find your story very interesting, thank you for sharing that.It also gives an interesting background as to why questdb is different than all the other competitors in the space.

评论 #23980101 未加载

judofyralmost 5 years ago

Congratulations on launching! It looks like a great product. Some technical questions which I didn’t see answered on my first glance:(1) Is it a single-server only, or is it possible to store data replicated as well?(2) I’m guessing that all the benchmarks were done with all the hot data paged into memory (correct?); what’s the performance once you hit the disk? How much memory do you recommend running with?(3) How’s the durability? How often do you write to disk? How do you take backups? Do you support streaming backups? How fast/slow/big are snapshot backups?

评论 #23977014 未加载

zumachasealmost 5 years ago

Hi Vlad - your anecdote about ship tracking is interesting (my other startup is an AIS based dry freight trader). You must know the Vortexa guys given your BP background.How does QuestDB differ from other timeseries/OLAP offerings? I'm not entirely clear.

评论 #23976167 未加载

jrexiliusalmost 5 years ago

This looks great, but more importantly good luck! There seems to be market need for this and it looks a solid implementation at first glance. You're off to a good start. I hope you and your team are successful!

评论 #23981505 未加载

sylvain_kerkouralmost 5 years ago

Congrats!Also thank you for your awesome blog[0]! It's really the kind of technical gem I enjoy reading late at night :)[0] <a href="https://questdb.io/blog" rel="nofollow">https://questdb.io/blog</a>

aloukissasalmost 5 years ago

This is great! Quick question: would you mind sharing why you went with Java vs something perhaps more performant like all C/C++ or Rust? I'd suspect language familiarity (which is 100% ok).

评论 #23976333 未加载

评论 #23975982 未加载

neurostimulantalmost 5 years ago

Congrats! I've been looking for a time series database but most of them seems to be in-memory nosql databases. QuestDB might be exactly what I need. I'll definitely give it a try soon!

评论 #23978732 未加载

评论 #23978307 未加载

pknerdalmost 5 years ago

Stories like these help a product to get traction. Every founder/creator must come up with a story related to the product.Congrats!

jedbergalmost 5 years ago

How does your performance compare to Atlas? [0][0] <a href="https://github.com/Netflix/atlas" rel="nofollow">https://github.com/Netflix/atlas</a>

评论 #23976546 未加载

anuragalmost 5 years ago

Amazing story and congrats on all the progress!Shameless plug: if you'd like to try it out in a production setting, we just created a one-click install for it:<a href="https://github.com/render-examples/questdb" rel="nofollow">https://github.com/render-examples/questdb</a>

airstrikealmost 5 years ago

There's an opportunity for a tool that combines this sort of technology in the backend with a spreadsheet-like GUI powered by formulas and all the user friendliness that comes with a non-programmer interface. Wall Street would forever be changed. Source: I'm one of the poor souls fighting my CPU and RAM to do the same thing with Excel and non-native add-ins by {FactSet, Capital IQ, Bloomberg}This stuff<pre><code> SELECT * FROM balances LATEST BY balance_ccy, cust_id WHERE timestamp <= '2020-04-22T16:15:00.000Z' AND NOT inactive; </code></pre> Makes me literally want to cry for knowing what is possible yet not being able to do this on my day job :(

评论 #23978986 未加载

评论 #23985964 未加载

rattrayalmost 5 years ago

The SQL explorer at <a href="http://try.questdb.io:9000/" rel="nofollow">http://try.questdb.io:9000/</a> is pretty slick – was that built in-house, or is it based on something that's open-source?

评论 #23982895 未加载

rattrayalmost 5 years ago

The database aside entirely, that story was a really fun read. Thanks for writing it up and sharing. Rooting for you!

monstradoalmost 5 years ago

I noticed there is "Clustering" mentioned under enterprise features, but I can't seem to find any references to it in the documentation. Is this something that will be strictly closed source?

评论 #23978713 未加载

gregwebsalmost 5 years ago

I am still hoping to see comparisons to Victoria Metrics, which also shows much better performance than many other TSDB. Victoria Metrics is Prometheus compatible whereas Quest now supports Postgres compatibility. Both have compatibility with InfluxDB.The Victoria Metrics story is somewhat similar where someone tried using Clickhouse for large time series data at work and was astonished at how much faster it was. He then made a reimplementation customized for time series data and the Prometheus ecosystem.

评论 #23984208 未加载

mooneateralmost 5 years ago

Awesome! Could you share a bit about business model?

评论 #23976136 未加载

评论 #23976138 未加载

Random_ernestalmost 5 years ago

Testing out the demo:SELECT * FROM trips WHERE tip_amount > 500 ORDER BY tip_amount DESCVery interesting :-)

评论 #23976581 未加载

评论 #23979798 未加载

评论 #23976527 未加载

posedgealmost 5 years ago

Your story is very inspiring. I wish you all the best with this project.

评论 #23980229 未加载

TheRealNGeniusalmost 5 years ago

Maybe I'm out of the loop, but I noticed lately that a majority of show/launch hn posts I click on have text that is muted. I know this happens on down voted comments, but is this saying that people are down voting the post itself?

评论 #24028093 未加载

评论 #23981484 未加载

einpoklumalmost 5 years ago

1. Does QuestDB support SQL sufficiently to run, say, the TPC-H analytics benchmark? (not a time series2. If so, can you give some performance figures for a single machine and reasonable scale factors (10 GB, 100 GB, 1000 GB)? Execution times for single queries are even more interesting than the overall random-queries-per-hour figure.3. Can you link to a widely-used benchmark for analytic workloads on time series, which one can use to compare performance of various DBMSes? With SQL-based queries preferably.

patrickaljordalmost 5 years ago

Congrats on the launch!One question, there are many open source database startups that make it easy to scale on the cloud. However, when you look into the offering, the scaling part is never actually open source and you end up paying for non open source stuff just like any other proprietary database. So I guess my question is, are you planning to go open core too or will you remain open source with some SaaS offering? Good luck to you!

评论 #23986116 未加载

js4everalmost 5 years ago

<a href="https://try.questdb.io:9000/" rel="nofollow">https://try.questdb.io:9000/</a> is down

评论 #23976003 未加载

评论 #23976014 未加载

评论 #23976002 未加载

bravuraalmost 5 years ago

Can you talk about some of the ideal use cases for a time series db? Versus Postgres or a graph database.

评论 #23980707 未加载

samskalmost 5 years ago

Does it supports some kind of compression ? That's very important when storing billions of events.

评论 #23978846 未加载

lpasselinalmost 5 years ago

Does postgres wire support mean QuestDB can be a drop-in replacement for a postgres database?Is this common?

评论 #23976380 未加载

fredliualmost 5 years ago

How do you get the best performance out of QuestDB? Does it have to be on bare metal machines? Is there any performance benchmark of QuestDB running on bare metal vs. cloud instances (e.g. EC2 with EBS volumes) etc.?

评论 #23992436 未加载

myth_drannonalmost 5 years ago

<a href="https://questdb.io/docs/crudOperations" rel="nofollow">https://questdb.io/docs/crudOperations</a> Has js errors and is not loading/page not found

评论 #23977956 未加载

jankotekalmost 5 years ago

Good luck. I work on similar OS database engine for about decade now. It is not bad, but I think consulting is better way to get funds. Also avoid "zero gc", JVM can be surprisingly good.Will be in touch :)

评论 #23984190 未加载

nlitenedalmost 5 years ago

Do you measure performance vs k/shakti?

评论 #23978855 未加载

dominotwalmost 5 years ago

something is off with your website. I just see images <a href="https://questdb.io/blog/2020/07/24/use-questdb-for-swag/" rel="nofollow">https://questdb.io/blog/2020/07/24/use-questdb-for-swag/</a>

评论 #23978135 未加载

wappaalmost 5 years ago

How do i join the slack group? It says to request invite from the workspace administrator?

评论 #23983863 未加载

nmnmalmost 5 years ago

Loved the story and the product!

jeromerousselotalmost 5 years ago

Great story! Thanks for sharing

评论 #23976767 未加载

massimosgrellialmost 5 years ago

Impressive. Can we talk?

评论 #23985771 未加载

rbruggemalmost 5 years ago

great story! well done.

monstradoalmost 5 years ago

Any plans on integration with Apache Arrow?

评论 #23978490 未加载

toshalmost 5 years ago

kudos @ launching, impressive

评论 #23981524 未加载

Maroalmost 5 years ago

Can you add a tldr?

评论 #23978556 未加载

45 comments

viialmost 5 years ago

评论 #23976752 未加载

shay_keralmost 5 years ago

评论 #23976487 未加载

评论 #23981281 未加载

pachicoalmost 5 years ago

评论 #23979059 未加载

评论 #23986665 未加载

thegreatpeteralmost 5 years ago

Am I the only one that's like "wtf is a time-series database compared to a normal one?"

评论 #23976315 未加载

评论 #23981564 未加载

评论 #23976358 未加载

评论 #23976299 未加载

评论 #23985515 未加载

评论 #23976310 未加载

评论 #23976482 未加载

评论 #23976306 未加载

hintymadalmost 5 years ago

评论 #23980015 未加载

评论 #23979372 未加载

numlock86almost 5 years ago

评论 #23985760 未加载

maz1balmost 5 years ago

评论 #23976264 未加载

didipalmost 5 years ago

I find your story very interesting, thank you for sharing that.It also gives an interesting background as to why questdb is different than all the other competitors in the space.

评论 #23980101 未加载

judofyralmost 5 years ago

评论 #23977014 未加载

zumachasealmost 5 years ago

评论 #23976167 未加载

jrexiliusalmost 5 years ago

评论 #23981505 未加载

sylvain_kerkouralmost 5 years ago

aloukissasalmost 5 years ago

This is great! Quick question: would you mind sharing why you went with Java vs something perhaps more performant like all C/C++ or Rust? I'd suspect language familiarity (which is 100% ok).

评论 #23976333 未加载

评论 #23975982 未加载

neurostimulantalmost 5 years ago

Congrats! I've been looking for a time series database but most of them seems to be in-memory nosql databases. QuestDB might be exactly what I need. I'll definitely give it a try soon!

评论 #23978732 未加载

评论 #23978307 未加载

pknerdalmost 5 years ago

Stories like these help a product to get traction. Every founder/creator must come up with a story related to the product.Congrats!

jedbergalmost 5 years ago

How does your performance compare to Atlas? [0][0] <a href="https://github.com/Netflix/atlas" rel="nofollow">https://github.com/Netflix/atlas</a>

评论 #23976546 未加载

anuragalmost 5 years ago

airstrikealmost 5 years ago

评论 #23978986 未加载

评论 #23985964 未加载

rattrayalmost 5 years ago

The SQL explorer at <a href="http://try.questdb.io:9000/" rel="nofollow">http://try.questdb.io:9000/</a> is pretty slick – was that built in-house, or is it based on something that's open-source?

评论 #23982895 未加载

rattrayalmost 5 years ago

The database aside entirely, that story was a really fun read. Thanks for writing it up and sharing. Rooting for you!

monstradoalmost 5 years ago

I noticed there is "Clustering" mentioned under enterprise features, but I can't seem to find any references to it in the documentation. Is this something that will be strictly closed source?

评论 #23978713 未加载

gregwebsalmost 5 years ago

评论 #23984208 未加载

mooneateralmost 5 years ago

Awesome! Could you share a bit about business model?

评论 #23976136 未加载

评论 #23976138 未加载

Random_ernestalmost 5 years ago

Testing out the demo:SELECT * FROM trips WHERE tip_amount > 500 ORDER BY tip_amount DESCVery interesting :-)

评论 #23976581 未加载

评论 #23979798 未加载

评论 #23976527 未加载

posedgealmost 5 years ago

Your story is very inspiring. I wish you all the best with this project.

评论 #23980229 未加载

TheRealNGeniusalmost 5 years ago

评论 #24028093 未加载

评论 #23981484 未加载

einpoklumalmost 5 years ago

patrickaljordalmost 5 years ago

评论 #23986116 未加载

js4everalmost 5 years ago

<a href="https://try.questdb.io:9000/" rel="nofollow">https://try.questdb.io:9000/</a> is down

评论 #23976003 未加载

评论 #23976014 未加载

评论 #23976002 未加载

bravuraalmost 5 years ago

Can you talk about some of the ideal use cases for a time series db? Versus Postgres or a graph database.

评论 #23980707 未加载

samskalmost 5 years ago

Does it supports some kind of compression ? That's very important when storing billions of events.

评论 #23978846 未加载

lpasselinalmost 5 years ago

Does postgres wire support mean QuestDB can be a drop-in replacement for a postgres database?Is this common?

评论 #23976380 未加载

fredliualmost 5 years ago

评论 #23992436 未加载

myth_drannonalmost 5 years ago

<a href="https://questdb.io/docs/crudOperations" rel="nofollow">https://questdb.io/docs/crudOperations</a> Has js errors and is not loading/page not found

评论 #23977956 未加载

jankotekalmost 5 years ago

评论 #23984190 未加载

nlitenedalmost 5 years ago

Do you measure performance vs k/shakti?

评论 #23978855 未加载

dominotwalmost 5 years ago

something is off with your website. I just see images <a href="https://questdb.io/blog/2020/07/24/use-questdb-for-swag/" rel="nofollow">https://questdb.io/blog/2020/07/24/use-questdb-for-swag/</a>

评论 #23978135 未加载

wappaalmost 5 years ago

How do i join the slack group? It says to request invite from the workspace administrator?

评论 #23983863 未加载

nmnmalmost 5 years ago

Loved the story and the product!

jeromerousselotalmost 5 years ago

Great story! Thanks for sharing

评论 #23976767 未加载

massimosgrellialmost 5 years ago

Impressive. Can we talk?

评论 #23985771 未加载

rbruggemalmost 5 years ago

great story! well done.

monstradoalmost 5 years ago

Any plans on integration with Apache Arrow?

评论 #23978490 未加载

toshalmost 5 years ago

kudos @ launching, impressive

评论 #23981524 未加载

Maroalmost 5 years ago

Can you add a tldr?

评论 #23978556 未加载