TechEcho

7 comments

We built a system called „friendly fire“ that nukes a server every 10 minutes. It has changed the mindset of all engineers and made our infrastructure missile-proof.Funnily enough it also improved our latencies a lot (which I guess is mostly due to memory leaks et al.)

评论 #20185939 未加载

评论 #20185928 未加载

评论 #20192129 未加载

jinqueenyalmost 6 years ago

The following link shows how we do Chaos Engineering in TiDB, an open source distributed database:<a href="https://www.pingcap.com/blog/chaos-practice-in-tidb/" rel="nofollow">https://www.pingcap.com/blog/chaos-practice-in-tidb/</a>Regarding the Fault Injection tools we are using:- Kernel Fault Injection, the Fault Injection Framework included in Linux kernel, you can use to implement simple fault injections to test device drivers.- SystemTap, a scripting language and tool diagnose of a performance or functional problem.- Fail, gofail for go and fail-rs for Rust- Namazu: a programmable fuzzy scheduler to test a distributed system.We also built our own Automatic Chaos platform, Schrodinger, to automate all these tests to improve both efficiency and coverage

jtmsalmost 6 years ago

I have not used it, but I have heard this is a very useful tool <a href="https://github.com/Netflix/chaosmonkey" rel="nofollow">https://github.com/Netflix/chaosmonkey</a>

评论 #20185286 未加载

评论 #20185281 未加载

azhenleyalmost 6 years ago

Other useful materials:- Chaos Monkey Guide for Engineers <a href="https://www.gremlin.com/chaos-monkey/" rel="nofollow">https://www.gremlin.com/chaos-monkey/</a>- Recent HN discussion on Resilience Engineering: Where do I start? <a href="https://news.ycombinator.com/item?id=19898645" rel="nofollow">https://news.ycombinator.com/item?id=19898645</a>

jorblumeseaalmost 6 years ago

If you've never run a chaos experiment, how do you square up blast radius with running in prod?It seems like this setup works great if built from the get-go but incredibly painful and possibly dangerous if starting with existing applications.

评论 #20188400 未加载

dangalmost 6 years ago

A thread from 2018: <a href="https://news.ycombinator.com/item?id=16244586" rel="nofollow">https://news.ycombinator.com/item?id=16244586</a>

agumonkeyalmost 6 years ago

I see no mention of AFL which seems like a fitting tool for the topic.Also the term 'antifragile' (lightly controversial) comes to mind.

7 comments

KenanSulaymanalmost 6 years ago

评论 #20185939 未加载

评论 #20185928 未加载

评论 #20192129 未加载

jinqueenyalmost 6 years ago

jtmsalmost 6 years ago

I have not used it, but I have heard this is a very useful tool <a href="https://github.com/Netflix/chaosmonkey" rel="nofollow">https://github.com/Netflix/chaosmonkey</a>

评论 #20185286 未加载

评论 #20185281 未加载

azhenleyalmost 6 years ago

jorblumeseaalmost 6 years ago

评论 #20188400 未加载

dangalmost 6 years ago

A thread from 2018: <a href="https://news.ycombinator.com/item?id=16244586" rel="nofollow">https://news.ycombinator.com/item?id=16244586</a>

agumonkeyalmost 6 years ago

I see no mention of AFL which seems like a fitting tool for the topic.Also the term 'antifragile' (lightly controversial) comes to mind.

Principles of Chaos Engineering (2018)

7 comments

Principles of Chaos Engineering (2018)

7 comments