TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What is the best data storage solution for BIG data and fast queries?

7 点作者 harrisreynolds超过 3 年前
I&#x27;m working with a client now that has a very large Postgres database. It is currently in the terabyte range but needs to support petabytes.<p>What is the best solution for storing this data that is fast and supports very large datasets?<p>For context the product competes in a geo-spatial market and loads GPS data from a large number of vehicles that are updating every 5-10 seconds.<p>We are considering Apache Pino but I am curious what the HN community would recommend here.<p>Thank you for any input!!

9 条评论

stocktech超过 3 年前
We&#x27;d need a lot more info to make a meaningful suggestion, but I&#x27;d at least investigate TimescaleDB to see if it fits. The fact it sits on postgres should be attractive to your client.
zX41ZdbW超过 3 年前
I would consider ClickHouse. It is perfect for interactive analytical queries on large datasets.<p>&gt; the product competes in a geo-spatial market and loads GPS data from a large number of vehicles that are updating every 5-10 seconds<p>There are multiple companies from this field that are using ClickHouse: <a href="https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;introduction&#x2F;adopters&#x2F;" rel="nofollow">https:&#x2F;&#x2F;clickhouse.com&#x2F;docs&#x2F;en&#x2F;introduction&#x2F;adopters&#x2F;</a>
ammar_x超过 3 年前
I recommend Google BigQuery. Its storage is cheap ($0.02&#x2F;GB) and can become even cheaper. You can process huge amounts of data quickly and pay $5 for each terabyte your query processes.<p>It&#x27;s easy to use too and its version of SQL is quite powerful.<p>On AWS, there is Athena which works on data stored in S3 and has the same processing price as BigQuery ($5&#x2F;TB.) However, from my experience, I recommend BigQuery.
samspenc超过 3 年前
If you want an open-source solution, would recommend HBase or Cassandra -- those have been battle-tested and used in a variety of small and large companies.<p>They allow you to store huge amounts of data, and as long as you design the primary key properly, allow you to make really fast queries to find the needle in the haystack (milliseconds) as well.<p>There are some tradeoffs of course: most engineers I&#x27;ve worked with who come from RDBMS to these tools find the lack of first-class support for secondary indices and SQL or SQL-like queries to be a bummer.
karmakaze超过 3 年前
The large amount of data and number of vehicles seem to be naturally partitioned. In that case you could use anything you want with sharding. Or is it the case that any vehicle can read&#x2F;write data for any location or perform global analytics?
评论 #29836068 未加载
evv555超过 3 年前
Not enough information provided but if the data can be organized into meaningful partitions then S3 using Hive partition schema. Pinot should be able to consume from there as well.
prirun超过 3 年前
Recent HN topic: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=29825490" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=29825490</a>
ubadair超过 3 年前
Have a look at <a href="https:&#x2F;&#x2F;www.ocient.com" rel="nofollow">https:&#x2F;&#x2F;www.ocient.com</a><p>Not affiliated, but I know people who work there.
nojito超过 3 年前
Clickhouse depending on the true size and retention requirements you have.