TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Is there any demand on time-series data mining?

3 pointsby chaotic-goodover 10 years ago
Hi HN!<p>I&#x27;m building time-series database. It seemed that there is some demand for solution that can store metrics and draw graphs (something like graphite or influxdb).<p>My db is different. I&#x27;m trying to create database that can be used to mine time-series data efficiently (at low cost and low effort). It will be able to perform similarity search (given one time-series find the most similar), motif search, clustering and so on. Functionality is very similar to jMotif only implemented as distributed database. No graphing and no interaction with graphite so far! :)<p>Is there any demand for such thing in software industry?

1 comment

lioriover 10 years ago
Lots. If you could make one, it would sell quite well. This leads to question… why nobody did so earlier?<p>The problem is, that everyone&#x27;s problem is different. &quot;Time Series&quot; is a very broad topic; you apply widely different algorithms to time series related to social activities (e.g. energy usage over the year; lots of seasonality, anomalies), financial activities (e.g. stock prices; modeling various market limitations), machine-generated events (e.g. server logs; text parsing, usually simple statistics), scientific experiments (e.g. intensity of an observation in a short time span; very specialized algorithms for basically any experiment). Moreover, even in a single class of data, most actually good ML algorithms require heavy tuning in terms of setting parameters, evaluating performance, etc. There are algorithms that don&#x27;t have many knobs to tune, but their performance is often subpar too.<p>In the end, you&#x27;d end up with a product that either is so generic that it doesn&#x27;t do anything well, a product that basically only stores data and does simple statistics (influxdb, druid…), a specialized product for a specific market (there are already lots of them, e.g. for server logs there&#x27;s Splunk, which is basically `grep` on steroids).<p>I&#x27;m a pessimist here, but only because I&#x27;m actually working with time series data in two of those settings (server logs, scientific experiments), and had to evaluate what&#x27;s on the market (I use a commercial package for the first one, and I ended up writing my own scripting in `R` for the second because, seriously, there&#x27;s no software package that would be powerful enough to have all the knobs I need, yet be simple enough to use). I&#x27;d love to see an actually featureful time series software, but I fear starting from a generic &quot;Time Series&quot; database will bring nothing new to the market.