TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Pg_incremental – Incremental Data Processing in Postgres

6 pointsby craigkerstiens5 months ago

1 comment

mslot5 months ago
I created pg_incremental because I keep running into the same challenge in PostgreSQL: You have a table of raw event data that you insert into, either individual rows when the event happens, or batches of events in other systems.<p>You then maybe want to aggregate the data, but the table is too big to keep reprocessing it, so you create a rollup table and only aggregate new data and insert into or update the rollup table.<p>However, how do you actually select the &quot;new&quot; data? That&#x27;s more challenging than it seems, and you also need to orchestrate everything.<p>pg_incremental is a tool to help you create automated, reliable, incremental processing pipelines. It is built on top of pg_cron and around the idea of parameterized SQL commands.<p>You can define several types of pipelines:<p>- Sequence pipelines process a range of sequence values, to automatically aggregate or transform new data.<p>- Time interval pipelines process a range of time intervals after a time interval has passed, to automatically aggregate or export new data.<p>- File list pipelines process new files showing up in a directory, to automatically import data.<p>After defining a pipeline, new inserts will automatically get processed by the periodic background job. The SQL command is executed for a range of new sequence values, a new time interval, or a new file name, or skipped if there&#x27;s no new work. Brin indexes are very useful for fast range scans.<p>The extension also ensures correct behaviour in the presence of concurrent inserts by waiting for ongoing writes to finish.<p>Overall, it simplifies the process of setting up an automated incremental processing pipeline to a single SQL command. There&#x27;s not a lot of magic to it, but it&#x27;s simple, reliable, and very versatile.