TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Runbooks for better incident management

78 pointsby acefluxalmost 4 years ago

9 comments

ellen364almost 4 years ago
‘Prevents an issue like this: &quot;I recently ran into a situation where I spent 6 hours understanding how something works that would have taken 20 minutes if the relevant information was stored somewhere.&quot;‘<p>Recently I spent 10 hours on a usually routine task. At the end of the slog, my first thought wasn’t “I should spend more time writing that up!” The article was a good reminder that it’s worth scribbling something down. Even the basics of the “gotcha!” and snippets of code for debugging could save someone else (future me?) another 10 hours.<p>One thing I didn’t get from the article was how runbooks are created. It mentions the “sticky note on someone’s desk” approach and the “workflows for everything” approach. There’s a lot of ground in between. I guess people write lots of how-tos and eventually they’re turned into a runbook?
评论 #27865174 未加载
评论 #27864711 未加载
评论 #27867329 未加载
icytherealmost 4 years ago
gitlab runbooks is a great place to learn: <a href="https:&#x2F;&#x2F;docs.gitlab.com&#x2F;ee&#x2F;user&#x2F;project&#x2F;clusters&#x2F;runbooks&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.gitlab.com&#x2F;ee&#x2F;user&#x2F;project&#x2F;clusters&#x2F;runbooks&#x2F;</a>
评论 #27863964 未加载
unixheroalmost 4 years ago
And here are some actual runbooks which Societe Generale have donated to the community: <a href="https:&#x2F;&#x2F;github.com&#x2F;certsocietegenerale&#x2F;IRM&#x2F;tree&#x2F;master&#x2F;EN" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;certsocietegenerale&#x2F;IRM&#x2F;tree&#x2F;master&#x2F;EN</a>
评论 #27864671 未加载
ogjunkyardalmost 4 years ago
I keep personal &quot;runbooks&quot; for a lot of the common work I deal with over time. Eventually, this stuff gets automated where possible, but taking the time to work through all of the problems, write it down, and do it in a way that I can show someone else has helped me make sure that when I sit down to automate something, I truly understand the &quot;domain&quot;.<p>It also helps tremendously when you have someone reach out to you during off-hours when you can just look through some documentation you have on hand to blaze through a task that takes a lot less time than if you had to figure things out from scratch.
mywacadayalmost 4 years ago
We use ms word, one run book per app for third part applications. That have to be reviewed&#x2F;updated&#x2F;APPROVED at least once per year, not optional. It&#x27;s this that adds the value, not how the info is stored.
hughrralmost 4 years ago
Make sure whatever you keep them in has an offline option. I was on the end of handling an outage and confluence blew up.<p>We&#x27;re using Markdown in github now with clone per SRE.
rmetzleralmost 4 years ago
You can write markdown in Google Alerts. This is what we do.
nyellinalmost 4 years ago
I&#x27;m founder of a startup in this area. Our product is NOT just the usual automated runbook approach.<p>If anyone has explored more sophisticated solutions than wiki pages, I would love to talk and learn from your experience
评论 #27864537 未加载
linker3000almost 4 years ago
&quot;We have a major incident with connectivity to the building, login to the knowledge server and see what the runbook says...oh!&quot;<p>For mission critical stuff, print out your Incident Management processes and have a physical &#x27;Master Runbook&#x27; in a prominent place in your department&#x2F;cubicle&#x2F;office.<p>Also, printed procedures with numbered steps and checkboxes allows for a visual record of progress, plus there&#x27;s room for notes when things deviate from the expected.<p>Each annotated runbook then becomes a reference for the Incident&#x2F;Problem management&#x2F;RCA write-up - unless, of course, you fancy following the runbook on one screen (if you can), while also updating the service management ticket on another (if you can) and getting out comms to senior stakeholders (if you can) and dealing with the sudden influx of tickets, phone calls and emails (if all the systems are still accessible).<p>Plus, if you are called into a meeting, or have to go check something, you can take the paperwork with you.
评论 #27864090 未加载