Building a scalable time-series database on PostgreSQL

305 pointsby RobAtticusabout 8 years ago

32 comments

endymi0nabout 8 years ago

While I appreciate PostgreSQL every day, am I the only one who thinks this is a rather bad idea? The row based engine of PG is the antithesis of efficient storage and retrieval of timeseries of similar patterns, yielding almost no compression. Columnar storage should naturally be much better at that (BigQuery, Redshift, Citus), culminating in purpose built stores like Influx, Prometheus or KDB. Prometheus for example manages to compress an average 64-bit FP sample including metadata to just 1.3-3.3 bytes depending on engine. As most DB stuff is I/O bound, that usually translates into at least an order of magnitude faster lookups.

评论 #14039030 未加载

评论 #14036605 未加载

评论 #14037514 未加载

评论 #14036579 未加载

评论 #14037041 未加载

LogicXabout 8 years ago

Congrats on the release!I'm in the process of migrating from influxDB to a combination of Kafka and TimeScaleDB.Has anyone beaten me to the lunch with testing if it works with [PipelineDB](<a href="https://www.pipelinedb.com/" rel="nofollow">https://www.pipelinedb.com/</a>)? I can see that being a powerful combination, perhaps also tied to their forthcoming [Stride](<a href="https://www.stride.io/" rel="nofollow">https://www.stride.io/</a>) analytics API.

评论 #14036533 未加载

评论 #14038223 未加载

RobAtticusabout 8 years ago

Didn't want to make the title too long, but I should note this is an open source database. Here if anyone has any questions.Also, github if you just want to see the code: <a href="https://github.com/timescale/timescaledb/" rel="nofollow">https://github.com/timescale/timescaledb/</a>

jnordwickabout 8 years ago

What do you plan on doing for benchmarking? I don't expect you to get something like STAC done, but will you try to find general benchmarks that others use?I've been working with TS databases for a long time now, and it never fails that every database vendor always has benchmarks showing they are the best (no put that more bluntly, when you come out with your own benchmark suite and you are the fastest/smallest/bestest I won't be surprised or believe you).I don't expect you to be the fastest when having a row-oriented architecture, and it would be an unfair comparison against the non-free databases, but I would like realistic numbers.Actually, if you came out 2nd to 3rd against competitors' workloads, I would be far more impresses.(But really? A row-oriented TS database?)

评论 #14038011 未加载

jjirsaabout 8 years ago

At my last company we did tens of billions of sensor events per day into Cassandra and I thank god our engineering team was smart enough to spend manpower on product instead of writing yet another database

评论 #14039988 未加载

评论 #14043639 未加载

评论 #14039846 未加载

artellectualabout 8 years ago

Wow, lots of very critical comments. I for one think this is a very good idea. I have a use case right now that pretty much fits the manifesto of timescale. Having the power of SQL is very atteactive. I'm looking to move my current setup to timescale. Will let you guys know how it goes.

评论 #14051363 未加载

ckluisabout 8 years ago

from:<a href="http://www.timescale.com/index.html" rel="nofollow">http://www.timescale.com/index.html</a>Full SQL interfaceScale-out distributed architectureAuto partitions across space and timeSingle-node and cluster optimizationsComplex query predicatesDistributed aggregationsNative JOINs with relational dataLabels and metrics supportGeo-spatial query supportFine-grained access controlReliable (active replication, backups)Automated data retention policies

sciurusabout 8 years ago

If your main use case for timeseries is a graphite/statsd replacement, another PostgesSQL-based project worth looking at is tgres.<a href="https://grisha.org/blog/2017/03/22/tgres-0-dot-10-dot-0b-time-series-with-go-and-postgresql/" rel="nofollow">https://grisha.org/blog/2017/03/22/tgres-0-dot-10-dot-0b-tim...</a><a href="https://github.com/tgres/tgres" rel="nofollow">https://github.com/tgres/tgres</a>

评论 #14037411 未加载

Dangerangerabout 8 years ago

This looks nice. I've had to roll my own PostgreSQL schema and custom functions a few times for timeseries data and if this prevents the need for that I'm impressed.The author mentioned write performance, but didn't touch much on read performance. Are there any benchmarks you can share in that direction?Also the author didn't touch much on the space requirements or the node splitting expectations.Those two areas would be useful to explain in greater detail.

评论 #14036365 未加载

manigandhamabout 8 years ago

Most distributed SQL databases fall into 2 uses: large data-warehouse and scalable primary data store.For DW needs (including time series), MemSQL is still the best option for distributed SQL unless you absolutely need postgres features. Proprietary w/mysql flavor but integrated columnstore + memory rowstore built for performance and fantastic ops management software makes it a great system. Also has interesting kafka and s3 ingest pipelines.For an operational db, Citus is a great scale-out option for postgres. You can combine it with cstore to get columnar compression for DW use but at that point it's better to just use a more focused product. There's also CockroachDB which looks good (if it ever gets out of beta). I'd also recommend ScyllaDB + JanusGraph once both mature to get a scalable multi-master graph database if HA/reliability/replication are a big concern.Timescale DB is interesting and it's always nice to see more options, but right now there are better data warehouse options already that work just fine for time series + SQL. A built-in columnstore would go a long way to making it competitive though.

Tommekabout 8 years ago

Whats the main difference compared to pg_partman? That provides parent and child tables, automated partitioning and the user only queries the parent tables.<a href="https://github.com/keithf4/pg_partman" rel="nofollow">https://github.com/keithf4/pg_partman</a>

评论 #14036136 未加载

denfromufaabout 8 years ago

Has anyone used gnocchi outside of OpenStack?It can use postgresql as an index store.<a href="https://docs.openstack.org/developer/gnocchi/install.html" rel="nofollow">https://docs.openstack.org/developer/gnocchi/install.html</a>

wsxiaoysabout 8 years ago

Congrats on the launch!I was investigating the same topic (PG based timeseries database) for a stock tick data project, would definitely give timescaledb a try.Since financial data is mentioned in the blog, would be curious on how it performed / scaled in practical.

评论 #14036174 未加载

koffiezetabout 8 years ago

So, having played/worked with a few time-series databases, this just doesn't seem to fit the picture we're used to. For applications having to store a set of well-defined time-series in a more optimal way, it looks great. As a generic time-series database on the other hand, this sounds like a maintenance nightmare.I quite like Postgresql (and deploy it all the time), and I'm no fan of nosql stuff, which just means you don't have to properly analyze your database structure before-hand, but with time-series it's different matter. The data you tend to send to generic time-series databases tends to be very unpredictable. I currently don't care what data is sent to Prometheus or Influx. This includes, but is not limited to ZFS stats of our storage, system load, network traffic, VMWare, nginx/haproxy, application usage and errors, ... I know that when I'll need it, I'll have it available and can try to correlate the data at any point in the future. In TimescaleDB it looks like I would have to pre-create tables fitting all those stats, which would make it an absolute pain in the ass.As counter-intuitive as it sounds, to do it properly/flexible/usable in a real world, you should impose a fixed/limited database structure. The tgres project mentioned elsewhere in this thread seems a lot more useful for 'generic' cases. Maybe they can be combined in some way.Also, some other important things in the time-series world are a REST api. Yes you might want advanced SQL to query data, but no you don't want that to insert it. Pretty much any application can do http(s) stuff without any additional libraries (except when you're talking about C/C++ things). Postgres on the other hand? Oh no I need a JDBC driver for this Java app, I need an extra Python library for X, ... Not to mention you suddenly have to manage database pools in the applications and have something that is a potential bottleneck.Then, no Grafana support is a no-go. It is the de-facto standard visualization tool these days.What about retention policies, you don't need ms-accurate data-points from 2 years ago, when talking about many data-points, you might want to aggregate this. Though since the data set in TimescaleDB will be limited due to specific data-sets, this could be less of a problem.

评论 #14043493 未加载

jsulinskiabout 8 years ago

How does this compare to say, Aerospike, or Honeycomb.io?

评论 #14036272 未加载

评论 #14036041 未加载

stalllerabout 8 years ago

Hi from Timescale, here are 6 insights we took away from this awesome discussion. Thank you!<a href="https://blog.timescale.com/thank-you-hn-20-000-views-1000-stars-and-6-insights-from-the-timescaledb-launch-d7e9c5cdb9d0" rel="nofollow">https://blog.timescale.com/thank-you-hn-20-000-views-1000-st...</a>

joshpadnickabout 8 years ago

This looks promising! Somewhat unrelated question, but has anyone had success using AWS DynamoDB as a time-series database? I'm generally curious on the case for a "managed" TSDB versus setting up your own open source one.

评论 #14045185 未加载

评论 #14036192 未加载

dangoldinabout 8 years ago

Very cool. The partitioning reminds me of the way VoltDB does it. By splitting it across a set of keys it's able to ensure that queries can be run concurrently on their own node.Are you guys familiar with it and I am understanding this properly?

评论 #14039546 未加载

ciconiaabout 8 years ago

How does that differ from PostgreSQL's stock table inheritance (partition data on arbitrary rules, while letting you efficiently query the parent table using constraints)? The TimescaleDB approach apparently looks very similar.

评论 #14038496 未加载

pnathanabout 8 years ago

Very interesting.Can you speak as to the stability of your system for production usage?

评论 #14036467 未加载

didipabout 8 years ago

I couldn't find the architectural diagram on how do you scale this across N nodes, can you elaborate on that? Is it similar to Postgres XL or Citus?

评论 #14038851 未加载

ddlathamabout 8 years ago

A single-node version is currently available for download. A clustered version is in the works.Adding clustering on sounds like the hardest part.

评论 #14036646 未加载

nickpetersonabout 8 years ago

Are the hypertables effectively views with insert triggers on them?Does Timescale have any baked in support bitemporal constructs?Are temporal foreign keys supported?

评论 #14039347 未加载

mrkurtabout 8 years ago

Joins against time series data sound really nice.

评论 #14036004 未加载

jbverschoorabout 8 years ago

How does this compare to citusdb?

评论 #14037385 未加载

andrestcabout 8 years ago

would be interesting to get that integrated with Prometheus remote storage.

评论 #14039223 未加载

skorguabout 8 years ago

Congrats on the launch!

otterleyabout 8 years ago

How many dimensions does each metric support?

评论 #14037397 未加载

评论 #14036725 未加载

tomc1985about 8 years ago

Wow, cool!How does this compare to Citus?

评论 #14037424 未加载

marknadalabout 8 years ago

Loved this article, as a competing database company, they did a fantastic job relating to developers and being authentic! Great job, please keep this up, it will definitely make you a winner.Key quote of interest: "135K writes per second per node" this is very good! Quite impressed.

williamle8300about 8 years ago

I'm a novice regarding DBs. What's the significance of this?

评论 #14039404 未加载

sigi45about 8 years ago

I really hate this stile of writing. Why does it have to sound like every other hipster it text?- I'm not your padawan- Postgresql is cool, why do you say it is boring?!- yes awesome some PhD People are vouching with there titles to a peace of software i might wanna use in a production environment? Nope. Tx for the efford and i will watch it but no.I do like that you do that work and tx for this but why not wrting a post which goes into details instead of this hipster pseudo tl;dr text?

评论 #14037339 未加载