科技回声

9 条评论

agwa将近 11 年前

This incident makes me think that services like Redis should support running with two sets of credentials at once in order to facilitate credential rolling. As it currently stands, rolling credentials is a rather big deal with a chance of things going wrong in the process.Aside: the text on that page is extremely difficult to read because of poor contrast (#8584B2 on #282936) and might be impossible for people with vision impairment. If anyone from Heroku is reading, you should change the color scheme to be compliant with the W3C Web Content Accessibility Guidelines. See: <a href="http://www.snook.ca/technical/colour_contrast/colour.html" rel="nofollow">http://www.snook.ca/technical/colour_contrast/colour.html</a>

评论 #7947627 未加载

评论 #7947646 未加载

评论 #7948244 未加载

thaumaturgy将近 11 年前

Because I so rarely feel compelled to say this: this is a really great post-mortem. It's technical, it's not loaded down with sales-speak, and it's straightforward. I really hope post-mortems like this become more of a trend.

theGimp将近 11 年前

This paragraph reads like a response to the criticism they received a few days ago for scheduling maintenance at 2pm PST:On June 23rd we performed a credential roll on these Redis servers in our US cloud during a two hour scheduled maintenance window. Because we operate a service used globally, there is a less-than 10% difference in usage between so-called "peak hours" and “non-peak” hours. We scheduled maintenance for this time because it was not a peak time, but moreso because this period has high coverage from relevant engineering teams, should issues arise. By performing maintenance during this period, we were able to react more quickly and muster those teams within seconds.

评论 #7947641 未加载

评论 #7947479 未加载

gdeglin将近 11 年前

Seems like first trying this maintenance procedure in a staging environment would have caught the problem.

hunvreus将近 11 年前

> We are reviewing our internal processes to ensure that communication between groups is more effective, so that we can better inform our customers when situations occur.I see this as the only contentious point raised by some of their users. They are doing an outstanding job already at dealing with a large infrastructure running a wide range of heterogeneous applications. They likely run updates on their infrastructure on a regular basis, without anybody noticing.However, if you're selling me on the promise of taking care of infrastructure for me, you can't under-deliver on communicating as soon as st hits the fan.

ironlady将近 11 年前

I've been developing a Node site that is currently running on Heroku. This happened the first day after launch, and to say the least my blood pressure was through the roof all day. I was terrified if something went wrong, we would be dead in the water. I don't think I would deal with them again (if I had the chance).

bithive123将近 11 年前

I am curious as to why they were relying on rolling Redis credentials at all since they would have needed to pre-arrange a secure channel for Redis traffic anyway.

saasdude将近 11 年前

who in the hell is stupid enough to use heroku?

sneak将近 11 年前

Ugh, the verb form of "impact" is so gross.

评论 #7948323 未加载

9 条评论

agwa将近 11 年前

评论 #7947627 未加载

评论 #7947646 未加载

评论 #7948244 未加载

thaumaturgy将近 11 年前

theGimp将近 11 年前

评论 #7947641 未加载

评论 #7947479 未加载

gdeglin将近 11 年前

Seems like first trying this maintenance procedure in a staging environment would have caught the problem.

hunvreus将近 11 年前

ironlady将近 11 年前

bithive123将近 11 年前

I am curious as to why they were relying on rolling Redis credentials at all since they would have needed to pre-arrange a secure channel for Redis traffic anyway.

saasdude将近 11 年前

who in the hell is stupid enough to use heroku?

sneak将近 11 年前

Ugh, the verb form of "impact" is so gross.

评论 #7948323 未加载

Postmortem of Heroku's June 23 Downtime

9 条评论

Postmortem of Heroku's June 23 Downtime

9 条评论