We have IoT sensor data currently flowing Device → Telegraf → InfluxDB → Grafana (TIG stack). This is fine for simple queries and visualization. We want to keep using Grafana, but I'm looking to move to a more proper data infrastructure for 1) SQL-based analytics and ML applications, 2) pub-sub e.g. MQTT to decouple sources from sinks, 3) long-term storage.<p>We're talking on the order of dozens/hundreds of devices posting data every min or so, not petabytes.<p>Should I be following a data lake architecture and doing MQTT → S3 / Redshift → ??? → PostgreSQL → Grafana? That seems like a lot of work / maintenance cost / overkill (?).<p>Or just write a MQTT → PostgreSQL bridge? Why isn't there an off-the-shelf solution, or am I missing something? I feel like this must be a common scenario?<p>Or should I be looking more closely at enterprise solutions e.g. Snowflake?<p>Known constraints:
• Grafana supports PostgreSQL as a data source (but not Amazon Redshift).
• Kinesis Firehose supports S3 and Redshift as destinations (but not RDS / PostgreSQL).
• Telegraf doesn't (officially) support PostgreSQL as an output.<p>Bottom line: I am not a data engineer and want to avoid being one :) Maybe someday we'll be able to afford one, but in the meantime, I want to set things up so it's at least pointing in the right direction, without significant time/money cost.
I'm writing docs for <a href="https://questdb.io/" rel="nofollow">https://questdb.io/</a> which offers SQL-based analytics for time series. We have quite a few users already who use us for ML applications that rely on performance.<p>Instead of writing an MQTT -> PostgreSQL bridge, you could use Telegraf to listen to MQTT topics and write data to QuestDB over InfluxDB line protocol when it meets certain criteria. One of our users shared the tooling they use in industrial IoT and a tutorial here for exactly this use case: <a href="https://questdb.io/tutorial/2020/08/25/questitto/#stack" rel="nofollow">https://questdb.io/tutorial/2020/08/25/questitto/#stack</a>
Load directly into Elasticsearch: it supports SQL as well as visualizations and simple querying via Kibana, and is far more performant than PostgreSQL for this kind of thing.<p>Ingest speed is tunable via number of shards and translog sync frequency, so no need for message queue if things get busy.