TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Principles of Chaos Engineering (2018)

133 pointsby archielcalmost 6 years ago

7 comments

KenanSulaymanalmost 6 years ago
We built a system called „friendly fire“ that nukes a server every 10 minutes. It has changed the mindset of all engineers and made our infrastructure missile-proof.<p>Funnily enough it also improved our latencies a lot (which I guess is mostly due to memory leaks et al.)
评论 #20185939 未加载
评论 #20185928 未加载
评论 #20192129 未加载
jinqueenyalmost 6 years ago
The following link shows how we do Chaos Engineering in TiDB, an open source distributed database:<p><a href="https:&#x2F;&#x2F;www.pingcap.com&#x2F;blog&#x2F;chaos-practice-in-tidb&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.pingcap.com&#x2F;blog&#x2F;chaos-practice-in-tidb&#x2F;</a><p>Regarding the Fault Injection tools we are using:<p>- Kernel Fault Injection, the Fault Injection Framework included in Linux kernel, you can use to implement simple fault injections to test device drivers.<p>- SystemTap, a scripting language and tool diagnose of a performance or functional problem.<p>- Fail, gofail for go and fail-rs for Rust<p>- Namazu: a programmable fuzzy scheduler to test a distributed system.<p>We also built our own Automatic Chaos platform, Schrodinger, to automate all these tests to improve both efficiency and coverage
jtmsalmost 6 years ago
I have not used it, but I have heard this is a very useful tool <a href="https:&#x2F;&#x2F;github.com&#x2F;Netflix&#x2F;chaosmonkey" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Netflix&#x2F;chaosmonkey</a>
评论 #20185286 未加载
评论 #20185281 未加载
azhenleyalmost 6 years ago
Other useful materials:<p>- Chaos Monkey Guide for Engineers <a href="https:&#x2F;&#x2F;www.gremlin.com&#x2F;chaos-monkey&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.gremlin.com&#x2F;chaos-monkey&#x2F;</a><p>- Recent HN discussion on Resilience Engineering: Where do I start? <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19898645" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19898645</a>
jorblumeseaalmost 6 years ago
If you&#x27;ve never run a chaos experiment, how do you square up blast radius with running in prod?<p>It seems like this setup works great if built from the get-go but incredibly painful and possibly dangerous if starting with existing applications.
评论 #20188400 未加载
dangalmost 6 years ago
A thread from 2018: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16244586" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16244586</a>
agumonkeyalmost 6 years ago
I see no mention of AFL which seems like a fitting tool for the topic.<p>Also the term &#x27;antifragile&#x27; (lightly controversial) comes to mind.