TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to Scale PostgreSQL on AWS: Learnings from Citus Cloud

177 点作者 twakefield大约 8 年前

14 条评论

simonw大约 8 年前
Citus are doing a fantastic job on content marketing. Every single piece they publish on <a href="https:&#x2F;&#x2F;www.citusdata.com&#x2F;blog&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.citusdata.com&#x2F;blog&#x2F;</a> is a case-study in how to write content (and headlines) that appeal to the kinds of developers their product targets.<p>&quot;How to Scale PostgreSQL on AWS: Learnings from Citus Cloud&quot; - seriously, how am I as a PostgreSQL-liking developer who cares about scalability NOT going to click through to that article?
评论 #13841546 未加载
kornish大约 8 年前
Citus Cloud is perhaps most exciting me because it has tremendous momentum: as the combined product of deep technical expertise meeting top-flight open source software meeting tons of end user experience, it&#x27;s quickly outpacing platforms which are locked-in anachronisms. Take Redshift: Postgres 8.4? After you&#x27;ve used some of the features in 9.6, it&#x27;s hard to go back. It&#x27;d be interesting to see some numbers around Citus Cloud&#x27;s battle-tested deployments.<p>As a side note, these blog posts on high-level techniques and open source tools (e.g. PgBouncer, wal-e) are useful for anyone considering deploying an on-prem version of Citus as part of a product – thanks, Ozgun!<p>Usual disclaimers apply: not an employee, but big fan of the team and technology and it&#x27;s great to see them gaining well-deserved mindshare.
pjungwir大约 8 年前
I saw the section on EBS, but it didn&#x27;t offer much advice. Getting good performance on networked storage is the biggest challenge to me. The last time I asked about that here [1], I got this answer:<p><pre><code> nasalgoat 161 days ago [-] The secret to EBS is to use General SSD, not Provisioned, but use a RAID stripe. The reason this works is because IOPS are provisioned per EBS drive and by the size of the drive. So a RAID0 stripe of, say, ten General SSD drives will outperform the more expensive PIOPS single drive. </code></pre> That sounds like a great approach, although I haven&#x27;t had time to try it out yet. I&#x27;m curious if anyone else has done anything like that.<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12609172" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12609172</a>
评论 #13842370 未加载
评论 #13842159 未加载
评论 #13842222 未加载
评论 #13842155 未加载
manigandham大约 8 年前
We use MemSQL and it has the best replication setup process for any relational database with 1 line:<p><pre><code> REPLICATE DATABASE db_name FROM master_user[:master_password]@master_host[:master_port][&#x2F;master_db_name] </code></pre> Why is it in 2017 we still don&#x27;t have any other database that can come close to this? Basic replication is very well understood and used everywhere but it seems like database creators just don&#x27;t understand what should be prioritized.
评论 #13841819 未加载
评论 #13842592 未加载
cromulent大约 8 年前
I was looking for the &quot;I want my database to be performant under high random load&quot; question. PIOPS can hurt.<p>Anyone have any experience running PostgreSQL on the new I3 instances?
评论 #13842235 未加载
agentgt大约 8 年前
I have mentioned this on some previous posted articles but we are really happy users of both citus and pipelinedb.<p>Check out pipelinedb if you are a Postgres fan (obviously it is for a different use case than Citus).<p>The only thing I don&#x27;t like about pipeline is that it currently is a fork and not an extension but that is supposed to change.<p>Consequently we syndicate to citus and pipeline through rabbitmq and Kafka.<p>We use google cloud as well. I&#x27;m contemplating on writing a post on what we have learned (and not :)) but I don&#x27;t think I could ever match the quality of this article.<p>And yes invariably some one will mention memsql does both but it is proprietary and not Postgres. I probably should have spent more time investigating it though (and eventually will).
jacobscott大约 8 年前
Does Citus (Cloud?) have features that offer better high availability and failover functionality than what RDS provides? Managed Patroni and packaged workflows for zero-downtime failover would be quite interesting, but I don&#x27;t see anything like that mentioned on <a href="https:&#x2F;&#x2F;www.citusdata.com&#x2F;product&#x2F;cloud" rel="nofollow">https:&#x2F;&#x2F;www.citusdata.com&#x2F;product&#x2F;cloud</a>.
评论 #13841972 未加载
forgotpwtomain大约 8 年前
Why is it seemingly impossible to read a technical blog-post on a company-blog, without some seven-year-old-humor type meme mixed-in?
hayd大约 8 年前
I wonder how Postgres Aurora will fair against Citus... that&#x27;s what we&#x27;re considering migrating to in the next year or so.
评论 #13841917 未加载
jordanthoms大约 8 年前
Any plans to take Citus in more of a data-warehousey, complex queries direction over time? We are starting to hit test limits of Postgres 9.6 and would like to move to a columnar store, but Redshift is hosted-only, Teradata looks expensive, Greenplum looks old.
评论 #13845769 未加载
jaequery大约 8 年前
i feel DB hosting is such an underrated field right now. in terms of scaling everything is pretty easy to scale except databases. i would love to see more services like this.
BIackSwan大约 8 年前
Aren&#x27;t most of these use cases already offered&#x2F;handled by Amazon RDS? Maybe not transparent sharding - but otherwise everything else?
评论 #13842517 未加载
LogicX大约 8 年前
Why does the community link on your pricing page lead to a 404?
评论 #13841794 未加载
marknadal大约 8 年前
1. I’d like my PostgreSQL database to be Highly Available<p>Highlight: &quot;The first is the complexity associated with it: it takes twelve steps to setup streaming replication ... open source solutions such as Governor and Patroni aim to do just that. That said, this integration again comes with a complexity cost.&quot;<p>I cannot believe it is 2017 and streaming replication is still considered complex. I have spent the last half decade+ of my life to try and make this simple, here is a demo: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;-i-11T5ZI9o" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;-i-11T5ZI9o</a><p>2. I’d like my application to not worry about failovers<p>Highlight: &quot;most PostgreSQL clients don’t have a mechanism to automatically retry different endpoints in case of a failure.&quot;<p>Master-Slave systems are not conducive to failover (determining a new Master involves its own locking&#x2F;election mechanisms). If we have streaming Master-Master replication by default, you can have some easy automatic failover - <a href="https:&#x2F;&#x2F;youtu.be&#x2F;-FN_J3etdvY" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;-FN_J3etdvY</a> .<p>4. I’d like my database to scale horizontally<p>Highlight: &quot;Deploying a distributed RDBMS into production requires a good understanding of both relational databases and distributed systems.&quot;<p>We can do a lot of work to improve understanding out there, Kyle Kingsbury (Aphyr of Jepsen Tests) has done a lot to spread awareness. A couple years ago I did a tech talk that explains the ideas with stick figures so that way even laypersons could understand what is going on: <a href="http:&#x2F;&#x2F;gun.js.org&#x2F;distributed&#x2F;matters.html" rel="nofollow">http:&#x2F;&#x2F;gun.js.org&#x2F;distributed&#x2F;matters.html</a> .<p>5. I’d like automatic backups for disaster recovery<p>Highlight: &quot;Distributed database backups are even harder.&quot;<p>See the (1) demo, this doesn&#x27;t have to be hard, it can be easy enough for frontend web developers IF the system is a streaming Master-Master database to begin with. Ontop of that, check out our &quot;backup to S3&quot; prototype where we scaled to doing 100M+ messages for $10&#x2F;day (all costs, CPU, disk, S3) here: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=x_WqBuEA7s8" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=x_WqBuEA7s8</a><p>My goal and argument here is that database vendors keep propagating the message of &quot;this is hard, so trust us and pay for systems&quot; that Aphyr has repeatedly proven to be broken (although, actually, Postgres did really well, Kyle was recommending it as the best general purpose database) - as Craig notes himself: &quot;In fact, I’ve been on calls where we quoted $300K for the services work, and never heard from that user again.&quot;<p>We need to break these cycles, and I do believe Craig is trying to do that with these blog posts, which is great. But, we have a long ways to go (all of us).