TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Amazon RedShift vs. local PostgreSQL

73 点作者 rarestblog超过 12 年前

10 条评论

iblaine超过 12 年前
What this test is essentially doing is comparing Postgres against a single node of Redshift. It is not surprising that Postgres is faster. But Redshift is not meant to be used on a single node.<p>What Postgres &#38; Redshift represent are are two different products for two very different problems. Postgres is good for small sets of transactional data like orders in a shopping cart system (less than 1TB). Redshift is good for big sets of data involving user behavior and clickstream analysis (greater than 1TB). I would not want to manage clickstream data on a single instance of Postgres nor would I want to manage an order system in Redshift.<p>A better test of Redshift would be to see how it compares to Asterdata...particularly with both in AWS. That should be telling.
评论 #5229537 未加载
monstrado超过 12 年前
I don't think comparing RedShift to Postgres is accurate, RedShift was not designed for transactions, it was designed to store/query billions of rows using a columnar storage format...it's more like an analytic database (Greenplum, Teradata). Also, these databases are designed to scale out, and so you usually don't really see compelling performance gains until you start adding a few nodes to help influence parallelization.<p>With that being said, I'd be interested to see how RedShift compares to Impala.
lcampbell超过 12 年前
I really don't understand what's going on here.<p>* You're measuring request latency. What part of that (for RedShift) is due to the network? (EDIT: I re-read and saw you're using `SELECT 1` as a gauge for round-trip latency and subtracting it from the results. Are you only doing this for RedShift, or also for local PostgreSQL? To me, it seems like that heuristic is over broad -- it encapsulates not only network latency, but syscall overhead, query parsing, etc).<p>* In your tests, PostgreSQL <i>without indices</i> performs on-par with RedShift. Does RedShift not support indexing? Is there some metric you're trying to show by not using indices? As designed, this benchmark does not map to any use-case I've ever seen.
评论 #5229093 未加载
rubyrescue超过 12 年前
very interesting. one of the reasons we picked mysql for a very high-volume app over postgres is that we have RDS and didn't want to do backups/snapshots/etc. Could we now use RedShift as a postgres-API RDS?
评论 #5229047 未加载
评论 #5229048 未加载
amalag超过 12 年前
You need to run this with a column store database like Infobright. Postgres is more of a transactional database, Infobright is suited towards the similar large dataset analytics that this is aimed towards.
评论 #5229515 未加载
eduardordm超过 12 年前
I run 3 large oracle RDS instances I wonder if redshift could be effectively used the same way, we have been thinking about migrating to postgresql.
评论 #5229034 未加载
Whitespace超过 12 年前
Wouldn't it have been better to do an EXPLAIN ANALYZE for the timing measurements instead of having the results returned locally?
评论 #5228973 未加载
评论 #5228963 未加载
ozgune超过 12 年前
This is a pretty interesting. I wonder how query performance differs between Redshift and local PostgreSQL for other types of benchmarks as well, say TPC-H queries. (And I guess how Redshift scales out as the dataset size increases in TPC-H.)
csummers超过 12 年前
I'd like to see some more information about the local setup, including hardware and the postgresql.conf. Otherwise, this tells me very little in terms of comparison.
评论 #5228908 未加载
crazydoggers超过 12 年前
Data warehousing often involves star schemas, which means lots of joins in your queries. I'd love to see how a real world OLAP tool performs on this.
评论 #5231158 未加载