TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Heroku Postmortem of June 29th Incident

28 pointsby timfalmost 13 years ago

6 comments

aaronswalmost 13 years ago
The "What we are doing" section seems pretty weak. The only substantive thing they say is "we have produced new tools which enable us to more expediently relocate database services from a failed availability zone."<p>How exactly are they planning to deal with the larger Cedar difficulties? Are they going to eliminate their dependence on ELBs? Go multi-region? Developers need to know this to decide whether to continue with Heroku or build their own platform.
评论 #4234950 未加载
paulsutteralmost 13 years ago
One subtle but important reason to use cross-region failover is that the network latency between the regions can prevent many casual or accidental dependencies between instances (if you configure instances in two regions to use the same database server, latency can cause the distant region to perform poorly).<p>This is why it's really hard to get cross region failover to work. Because you really need to make them independent.
评论 #4236527 未加载
jscottmilleralmost 13 years ago
&#62; Approximately 30% of our EC2 instances, which were responsible for running applications, databases and supporting infrastructure (including some components specific to the Bamboo stack), went offline<p>Combined with the incident report from amazon, does this mean that 30% of Heroku instances were in a single availability zone? That would be troubling.
评论 #4235013 未加载
adrianpikealmost 13 years ago
One of their suggestions is to have a follower of your DB and fall back to it. When they put the API in read-only mode, would I have been able to promote any followers?
goronbjornalmost 13 years ago
My questions are:<p>- Why aren't they committing to using geographically dispersed AWS instances?<p>- Why aren't they leveraging Salesforce's infrastructure at all?
评论 #4235055 未加载
评论 #4235022 未加载
latchalmost 13 years ago
Relying on AWS' API to mitigate an AWS failure seems dangerous. A fundamental catch-22 with on-demand provisioning/configuration.