TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Keep Last X entries per ID in larger DB

2 点作者 matttah超过 6 年前
I&#x27;m trying to figure out the best solution for the following:<p>* Daily importing of ~500+ million rows of data, with ~250 million unique ids. * I need to only keep the latest X entries per unique ID. Older entries are discarded after X entries for that id has been achieved. * Monthly will read out the entire dataset for processing<p>X can be anywhere from 1000 to 3000, it is static over the entire DB just depends on as we determine the best setting. Since I don&#x27;t access the data more than once a day, or at the end of the month, I would prefer not to pay for storage. There are over a billion unique id&#x27;s which I can partition by prefix or ranges. Each individual entry per ID is fairly small with only an integer and two decimals stored.<p>What would you recommend as a data store for this?<p>Thanks!

3 条评论

prostoalex超过 6 年前
Have a job queue running lower-priority DB queries. The queue workers select all records for a given ID, then prune off older records above X.<p>Insert a fresh record immediately, since you know it’s recent. Upon successful insert, fire off a queue request to go check on that ID.
评论 #18956965 未加载
verdverm超过 6 年前
Storage is cheap, analysis is expensive, sounds like you might be optimizing for the wrong variable. The data store of choice will likely depend on you access patterns during analysis.<p>Flat files is my best guess for your data? HDFS
评论 #18956955 未加载
ryanworl超过 6 年前
BigQuery is probably the best price&#x2F;effort ratio for this.