TechEcho

8 comments

jstyover 3 years ago

Major incidents aside, I always think that cache-related bugs are some of the most likely to go undetected since if you don't test for them end-to-end, they're really not that easy to spot & diagnose.An article sticking around too long on the home page. Semi-stale data creeping into your pipeline. Someone's security token being accepted post-revocation. All really hard to spot unless (1) you're explicitly looking, or (2) manure hits the fan.

评论 #30302583 未加载

评论 #30301171 未加载

teromover 3 years ago

Required reading for all of the "I could code up Twitter in a weekend" -types.The long listen queue -> multiple queued up retries feedback loop is a classic: <a href="https://datatracker.ietf.org/doc/html/rfc896" rel="nofollow">https://datatracker.ietf.org/doc/html/rfc896</a> TCP/IP "congestion collapse" and the 1986 Internet meltdown [various sources]

评论 #30300751 未加载

评论 #30301192 未加载

Smerityover 3 years ago

What I find most interesting in this is the pseudo detective story of hunting down disappearing post-mortem and "lessons learned" documentation. Optimistically we'd hope that perhaps the older systems no longer reflect the existing systems in any meaningful way (possibly as the org structures and/or software stacks shift and change) and they're no longer relevant.I'd imagine most lost knowledge is not an explicit decision however which means such historical scenarios / documentation / ... are just lost as part of business. Lost knowledge is the default for companies.Twitter is likely better than most given their documentation is all digital and there exist explicit processes to catalogue such incidents. I'd also be curious to see how much of this knowledge has been implicitly exported to their open source codebases.

评论 #30317350 未加载

plasmaover 3 years ago

I remember reading Facebooks caches had a dedicated standby set of “gutter” servers that would take over a failure quickly (otherwise inactive and unused) that was an interesting mitigation for some failure scenarios.

Jachover 3 years ago

These big incidents involving 'big cache' are fun to read about. Years ago I had to deal with a bunch of cache issues over a short time, but they were all minor incidents with minor uses of cache (simple memoization, storing stuff in maps on attributes of java singletons, browser local storage). Still, I made a checklist of questions to ask thenceforth on any proposal or implementation of a cache in a doc or code review. A bunch of them are just focused on actually paying attention to what your keys are made of and how invalidation works (or if you even can invalidate, or if it's even needed). I think for 'big cache' questions I should just refer to this blog post and ask "what's the risk of these issues?"

wizwit999over 3 years ago

Yeah, see also, Marc Brooker has a good article on why the bimodal behavior of caches can cause a lot of headaches <a href="https://brooker.co.za/blog/2021/08/27/caches.html" rel="nofollow">https://brooker.co.za/blog/2021/08/27/caches.html</a>

mprovostover 3 years ago

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton<a href="https://martinfowler.com/bliki/TwoHardThings.html" rel="nofollow">https://martinfowler.com/bliki/TwoHardThings.html</a>

评论 #30300705 未加载

spoonjimover 3 years ago

“ On Nov 8, a user changed their name from tigertwo to Woflstar_Bachi.”Horrifically inappropriate inclusion of PII in this post. Didn’t someone at legal go through this?

评论 #30301886 未加载

8 comments

jstyover 3 years ago

评论 #30302583 未加载

评论 #30301171 未加载

teromover 3 years ago

评论 #30300751 未加载

评论 #30301192 未加载

Smerityover 3 years ago

评论 #30317350 未加载

plasmaover 3 years ago

Jachover 3 years ago

wizwit999over 3 years ago

mprovostover 3 years ago

评论 #30300705 未加载

spoonjimover 3 years ago

“ On Nov 8, a user changed their name from tigertwo to Woflstar_Bachi.”Horrifically inappropriate inclusion of PII in this post. Didn’t someone at legal go through this?

评论 #30301886 未加载

A decade of major cache incidents at Twitter

8 comments

A decade of major cache incidents at Twitter

8 comments