TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Failure Friday: How We Ensure PagerDuty is Always Reliable

82 pointsby DougBarthover 11 years ago

5 comments

jedbergover 11 years ago
I posted this on the blog but I thought I&#x27;d repeat it here:<p>The simian army isn&#x27;t AWS only. :) Some of it runs on other stacks.<p>And the best part is, it is open source! So if you wanted to leverage the simian army, it wouldn&#x27;t be that hard to modify it to run on whatever stack you want and then submit the changes back. :)
teh_klevover 11 years ago
We just started using PagerDuty to deliver our Nagios alerts to landlines and mobile phones after losing confidence in Vodafone&#x27;s pager network.<p>The other thing we like is the integration with HipChat to deliver alerts into our NOC chat room.<p>Overall we&#x27;ve been quite impressed....will be more impressed if you folks run into actual trouble but we still get our alerts :)
评论 #6771699 未加载
mjalldayover 11 years ago
Annecdotal I know, however: pager duty is the only service we rely on that has yet to go down on us. These guys are solid!<p>I like that tip on how to simulate a slow network too.
kapitalxover 11 years ago
My first impression from the title was that this is a post-mortem for an actual failure on Friday. But after reading your post the title made more sense ;)<p>Great post!.
iLochover 11 years ago
It&#x27;s Wednesday!