Time Series, the new shiny?

180 pointsby iamd3vilabout 9 years ago

14 comments

eggyabout 9 years ago

I don't know Riak, other than its a distributed NoSQL key-value data store.Time series has always been prevalent in the fintec and quantitative finance, and other disciplines for decades. I read a book in the early 1990s on music as time series data, financial tickers, and so on.How is Riak different, or more suited to use than Kdb + q, J with JDB (free), Jd (a commercial J database like Kdb/q)[2], or the new Kerf lang/db being developed by Kevin Lawler[3]?Kevin also wrote kona, an opensource version of the "K programming language"[4].Kdb is very fast at time series analysis on large datasets, and has many years of proven value in the financial industry.[1] <a href="https://kx.com/" rel="nofollow">https://kx.com/</a> [2] <a href="http://www.jsoftware.com/jdhelp/overview.html" rel="nofollow">http://www.jsoftware.com/jdhelp/overview.html</a> [3] <a href="https://github.com/kevinlawler/kerf" rel="nofollow">https://github.com/kevinlawler/kerf</a> [4] <a href="https://github.com/kevinlawler/kona" rel="nofollow">https://github.com/kevinlawler/kona</a>

评论 #11666466 未加载

评论 #11667053 未加载

评论 #11666889 未加载

评论 #11666612 未加载

评论 #11671666 未加载

the_alchemistabout 9 years ago

> Riak uses the SHA hash as its distribution mechanism and divides the output range of the SHA hash evenly amongst participating nodes in the cluster.Wait, Riak uses SHA as distribution hash? Why use a cryptographic hash for distribution and not something like Murmur3, if you're talking about high-performant[0] ?[0] <a href="http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html" rel="nofollow">http://blog.reverberate.org/2012/01/state-of-hash-functions-...</a>

评论 #11670805 未加载

评论 #11666257 未加载

评论 #11666810 未加载

评论 #11666911 未加载

fit2ruleabout 9 years ago

I've always found it quite curious that computer human interfaces have always focused on the noun/verb proposition of describing data, and not the time/place. Time is the only true constant in the universe, and yet computers are set up to track and control it, seemingly, as a second thought.Imagine if instead of having files/folders to (teach,confuse) Grandma, we simply had a time-based system of references. If Time was a principle unit of information that a user was required to understand as an abstract concept, I feel that it would result in far better user interfaces.We can see this in the Music-making world, where Time is the most significant domain over which a Musician exerts control. A DAW-like interface for managing events seems to me to be quite intuitive - for so many other non-musical applications - that its almost extraordinary that someone hasn't built an email system, or accounting system, or a graphical-design system, of applications, oriented around this aspect. (Of course, they are out there - but it seems that Time management makes the dividing line between "professional" and "dilettante" users rather thick...)

评论 #11666096 未加载

评论 #11666698 未加载

评论 #11666226 未加载

bbrazilabout 9 years ago

Are there performance numbers available?We're on the look out for suitable remote storage for prometheus.io, and would want to know the hardware that'd be required to handle 1M samples/s and how many bytes a sample takes up.It doesn't support full float64 which we need, but we could workaround by putting it into a 64 bit unsigned number.

评论 #11670051 未加载

im_down_w_otpabout 9 years ago

Might be worth looking into dalmatiner.io (DalmatinerDB) as an alternative to this. It's also built on riak_core to manage cluster membership and the top-level framework for dealing with routing and rebalancing.Waited for a long time for Riak TS to come out. Tried KairosDB & Cyanite, but the operational overhead of Cassandra wasn't something I wanted to buy into for such a narrow use case (infrastructure metrics store), and then suddenly out of nowhere DalmatinerDB was released. The code is clean, the architecture is solid, and the ops story is simple.I don't have any affiliation of any kind with the Dataloop folks. I am however a happy end-user. We do currently use Riak KV due to its CRDT support though.

评论 #11681205 未加载

rdtscabout 9 years ago

I see SQL support, that is interesting. Isn't Riak the premier NoSQL database. I guess it is a NoNoSQL db now ;-)The implementation of SQL part is so neat. Great work whoever did that. It uses yecc and leex that comes with Erlang and rebar even knows how to compile those. Very cool!<a href="https://github.com/basho/riak_ql" rel="nofollow">https://github.com/basho/riak_ql</a>

评论 #11669048 未加载

评论 #11669055 未加载

Confusionabout 9 years ago

Poses the question<pre><code> So what’s the big deal? People have been recording temporally oriented data since we could chisel on tablets. </code></pre> Never answers it, but instead explains how Riak handles large time series. Certainly interesting, but I would like an answer to this question, as I don't understand the big deal.

评论 #11666053 未加载

评论 #11666329 未加载

评论 #11666054 未加载

评论 #11666079 未加载

评论 #11666052 未加载

flatwhiskersabout 9 years ago

In terms of getting data into RiakTS, would streaming something through Kafka be an option for instance?

评论 #11669162 未加载

epaulsonabout 9 years ago

As someone who deals with sensor data, the tricky part is really not the write-rate, but rather dealing with messy data. There's a lot of parallelism in sensor network streams, and for many domains you never look at the sensors from one device against the sensors of another device, so you can put them in entirely different databases and it doesn't matter. (It's not true in every case, of course, but if you're doing time series/streaming, ask yourself if it's true for you before picking a system)The real pain is handling data that arrives out of order or otherwise very late, or handling data that never arrives at all, or handling data that's clearly wrong. Worse, you may have streams that are defined/calculated from other streams for some algebra on series, e.g. series C is series A plus series B - so handling new data on A means you need to recalculate/update the view for C.Oh, and you'd like this all to be mostly declarative so you have some way to migrate between systems if you need to switch for whatever reason.Apache Beam/Google Dataflow gets a lot of this stuff right: it's not quite as declarative as I'd like but it gets the windowing flexibility right and handles restatements at a data model level.

评论 #11669689 未加载

jsonninjaabout 9 years ago

For the TS experts out there, any real world experience with Influx? (<a href="https://influxdata.com/" rel="nofollow">https://influxdata.com/</a>)

评论 #11670409 未加载

评论 #11669520 未加载

评论 #11670946 未加载

walrus01about 9 years ago

Time series does not necessarily have to be about 'huge' data either, just a much greater level of historical precision. Example:ISP sells a circuit with 95th percentile billing to a customer.If you poll SNMP data from a router interface on 60 second intervals and store it in an RRA file, you will lose a great deal of precision over time (because RRAs are highly compressed over time). You'll have no ability to go back and pull a query like "We want to see traffic stats for the DDoS this customer took at 9am on February 26th of last year".with time series statistics you can then feed it into tools such as grafana for visualization.An implementation such as openTSDB to grab the traffic stats for a particular SNMP OID and store it will allow you to store all traffic data forever and retrieve it as needed later on. The amount of data written per 60 second interval is miniscule, a server with a few hundred GB of SSD storage will be sufficient to store all traffic stats for relevant interfaces on core/agg routers for a fairly large sized ISP for several years.

rchabout 9 years ago

Love the simple install for development on a Mac. Thanks for that.

评论 #11670335 未加载

gnufiedabout 9 years ago

Looks really nice. although I am bit sad to see that - it requires structured schema. I have been lookout for a metric collection system (like influxdb) and this would fit very well - except the schema part.

评论 #11666543 未加载

sicularsabout 9 years ago

Hello, I'm the author of the post. Thanks for all the interest! AMA!