Stack Overflow Outage Postmortem

862 pointsby gbrayutalmost 9 years ago

71 comments

dkopialmost 9 years ago

Perfect. Awesome bug. Awesome Post Mortem. This was just fun to read.While this might have been caused by mistake - these types of bugs can be (and are) abused by hackers.<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS" rel="nofollow">https://www.owasp.org/index.php/Regular_expression_Denial_of...</a> <a href="https://en.wikipedia.org/wiki/ReDoS" rel="nofollow">https://en.wikipedia.org/wiki/ReDoS</a>The post also links to this video: <a href="https://vimeo.com/112065252" rel="nofollow">https://vimeo.com/112065252</a>

评论 #12132182 未加载

评论 #12133300 未加载

chubotalmost 9 years ago

Ha! The same bug happened internally at my company. In that case it was a regex matching a URL taking so much CPU as to cause a DOS of a proxy server. I won't be surprised if it's happened to someone here too.This is very timely, because minutes ago, I made a link to Russ Cox's articles in my Kernighan awk repo:<a href="https://github.com/andychu/bwk" rel="nofollow">https://github.com/andychu/bwk</a><a href="https://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">https://swtch.com/~rsc/regexp/regexp1.html</a>If you are not familiar with this issue, basically Perl popularized bad computer science... "regexes" are not regular languages.They say that this particular case triggered quadratic behavior, not exponential, but the point is that there is a linear time algorithm to do this.The file b.c in the awk repo implements the linear time algorithm:<a href="https://github.com/andychu/bwk/blob/master/b.c" rel="nofollow">https://github.com/andychu/bwk/blob/master/b.c</a>(and rsc's site has some nice sample code too, as well as caveats with regard to capturing and so forth)

评论 #12132349 未加载

评论 #12132584 未加载

评论 #12133655 未加载

评论 #12132313 未加载

评论 #12132334 未加载

smrtinsertalmost 9 years ago

"This regular expression has been replaced with a substring function."This should be the title of a book on software engineering.

评论 #12132870 未加载

评论 #12132570 未加载

评论 #12132422 未加载

评论 #12133448 未加载

评论 #12134185 未加载

StevePerkinsalmost 9 years ago

I'm surprised that a developer was able to fix StackOverflow without being able to look up the error message on StackOverflow.

评论 #12132878 未加载

评论 #12132431 未加载

redbeard0x0aalmost 9 years ago

In the past, I have done Load Balancer status checks against a special /status endpoint. I queried all the connected services (i.e. DB, Redis, etc) with a super fast query (i.e. `SELECT version();`). Monitoring CPU/MEM usage for scaling was separate.Comparing this to checking the home page, what is the best way to setup a health check for your load balancers?

评论 #12132137 未加载

评论 #12133539 未加载

评论 #12132675 未加载

alexbeckeralmost 9 years ago

I remember the day I learned that Python's "re" module uses backtracking for non-extended regexes. My tests covered lots of corner cases in the regex logic, but were too short for me to notice the performance penalty. Luckily I only caused a partial outage in production.I actually got to talk to Raymond Hettinger (Python core team) about why re uses a potentially exponential-time algorithm for regexes when there is a famous linear-time algorithm, and (I suspect) most programmers would assume linear complexity. As it turns out, there was an attempt to re-write re to fix this, but the re-write never managed to present exactly the same (extremely large) API as the existing module. He advised me that "the standard library is where code goes to die."

评论 #12132831 未加载

评论 #12132406 未加载

评论 #12135184 未加载

mwpmaybealmost 9 years ago

This is why I always do:<pre><code> s/^\s+//; s/\s+$//; </code></pre> Instead of:<pre><code> s/^\s+|\s+$//; </code></pre> Weirdly, I've "known" this since I started writing Perl in the mid-'90. Not sure where I originally read it (or was told it). Funny how that works.I try to write my regexes such that they anchor at the front of the strong or the back, or they describe the whole string; never an either-or anchoring type situation like this example.Spaces at beginning of string (100,000 iterations):<pre><code> Rate onestep twostep onestep 62500/s -- -2% twostep 63694/s 2% -- real 0m3.093s user 0m3.066s sys 0m0.018s </code></pre> Spaces at end of string (100,000 iterations):<pre><code> Rate twostep onestep twostep 55249/s -- -9% onestep 60976/s 10% -- real 0m3.453s user 0m3.421s sys 0m0.022s </code></pre> Spaces in middle of string (only 500 iterations because I don't want to sit here for four hours):<pre><code> Rate onestep twostep onestep 7.11/s -- -100% twostep 16667/s 234333% -- real 1m10.741s user 1m10.207s sys 0m0.228s</code></pre>

评论 #12132892 未加载

StavrosKalmost 9 years ago

I don't understand something: the regex expected a space character, followed by the end of the string. If the last character wasn't a space, this could never match. Why did the engine keep backtracking, even though it's easy to figure out that it could never match the regex?

评论 #12132345 未加载

评论 #12132175 未加载

评论 #12132202 未加载

评论 #12132377 未加载

评论 #12132405 未加载

selckinalmost 9 years ago

Is this the sort of thing that <a href="https://github.com/google/re2" rel="nofollow">https://github.com/google/re2</a> was made to solve?

评论 #12132069 未加载

评论 #12132449 未加载

评论 #12132397 未加载

mplewisalmost 9 years ago

I think this might have been the post they quoted.<a href="http://stackoverflow.com/questions/38484433/in-corona-sdk-how-to-join-tiles-into-one-word-using-matrix-game-breakout" rel="nofollow">http://stackoverflow.com/questions/38484433/in-corona-sdk-ho...</a>

评论 #12132315 未加载

评论 #12132352 未加载

评论 #12134127 未加载

johncoltranealmost 9 years ago

A few months ago, a Stack Overflow representative asked me if their presence at a dev conference was justified. My positive answer more or less revolved around the importance SO took in the daily life of programmers everywhere.If only she was there to witness the effect of a 34 minute downtime on an open space full of mobile/back/front developers.

评论 #12133158 未加载

评论 #12135528 未加载

junkealmost 9 years ago

Nice bug. I tried to replicate this and indeed, the time to notice that no match is found is growing very fast with the length of the input. Using a substring check is a good fix, but I tried to change the regex to fix this and: if instead of an end anchor, you can add an optional non-whitespace character at the end of the pattern, then you only have to check whether the optional part is empty. Testing with very long strings which respectively match and don't match shows that the result is immediate in both cases.<pre><code> (defparameter *scanner* (ppcre:create-scanner '(:sequence (:register (:greedy-repetition 1 nil :whitespace-char-class)) (:register (:greedy-repetition 0 1 :non-whitespace-char-class))))) (let ((length 40000)) (defparameter *no-match* (let ((string (make-string length :initial-element #\space))) (setf (char string (1- (length string))) #\+) string)) (defparameter *match* (make-string length :initial-element #\space))) (defun end-white-match (string) (ppcre:do-scans (ms me rs re *scanner* string) (when (and ms (= (aref re 1) (aref rs 1))) (return (values ms me))))) (time (end-white-match *match*)) 0, 40000 ;; Evaluation took: ;; 0.000 seconds of real time ;; 0.000000 seconds of total run time (0.000000 user, 0.000000 system) ;; 100.00% CPU ;; 25,139,832 processor cycles ;; 0 bytes consed (time (end-white-match *no-match*)) NIL ;; Evaluation took: ;; 0.000 seconds of real time ;; 0.000000 seconds of total run time (0.000000 user, 0.000000 system) ;; 100.00% CPU ;; 11,105,364 processor cycles ;; 0 bytes consed</code></pre>

评论 #12132493 未加载

lambdaalmost 9 years ago

Hmm. I wonder why one of the followup mitigations is not to move to a non-backtracking regex engine by default.Most of what you want to do with a regex can be done with an NFA or DFA based engine. That which can't be done with an NFA or DFA based engine is generally better handled with a parser than a regex.There are plenty of good DFA based regex matchers out there; RE2, the Rust regex crate, GNU grep, etc. At a glance, it even looks like glibc uses a DFA, though it supports POSIX REs which support backreferences so it must use backtracking at least for REs that contain backreferences.Predictable hash collisions were a big sources of DOS attacks in web scripting languages which use tables a lot, until they started rolling out randomized hashing algorithms to prevent easily predictable hash collisions. It seems like it would be best for languages and libraries to move to DFA based regexps, at least for anything that doesn't contain backreferences, to mitigate these kinds of issues from being easy to exploit.

kilroy123almost 9 years ago

> It took 10 minutes to identify the cause.I'm impressed they were able to do this so quickly.

评论 #12132237 未加载

评论 #12132160 未加载

评论 #12133423 未加载

评论 #12132382 未加载

评论 #12132293 未加载

brongondwanaalmost 9 years ago

Time to pop this old chestnut out:<a href="https://blog.fastmail.com/2014/12/14/on-duty/" rel="nofollow">https://blog.fastmail.com/2014/12/14/on-duty/</a>"At one stage, we decided to try to avoid having to be woken for some types of failure by using Heartbeat, a high availability solution for Linux, on our frontend servers. The thing is, our servers are actually really reliable, and we found that heartbeat failed more often than our systems - so the end result was reduced reliability! It's counter-intuitive, but automated high-availability often isn't."One of these days we'll finish our new system and I'll blog about that, which is that the automated systems are allowed to take ONE corrective action without paging, at which point they flag that the system is in compromised state. Any further test failures trigger an immediate wake of the on-call.

tibiapejagalaalmost 9 years ago

I wondered about this for some time.Simple regex (as in formal language theory) are matched in O(n) time by finite automaton.Extended regex like PCRE are more powerful, but most of the time are implemented by backtracking engines, where really bad regex pattern might go exponential, but even simple pattern as in postmortem can go O(n^2).Do implementations optimize simple regex patterns to O(n) matching? Even I wrote x86 JIT regex compiler for fun some time ago. Compilation time was really bad, but matching was O(n).

评论 #12135151 未加载

nanisalmost 9 years ago

As perlfaq4[1] shows:<pre><code> > You can do that with a pair of substitutions: > s/^\s+//; > s/\s+$//; </code></pre> It then notes, in an understated manner:<pre><code> > You can also write that as a single substitution, > although it turns out the combined statement is > slower than the separate ones. That might not > matter to you, though: > s/^\s+|\s+$//g; </code></pre> [1]: <a href="http://perldoc.perl.org/perlfaq4.html#How-do-I-strip-blank-space-from-the-beginning%2fend-of-a-string%3f" rel="nofollow">http://perldoc.perl.org/perlfaq4.html#How-do-I-strip-blank-s...</a>

评论 #12136435 未加载

jakozauralmost 9 years ago

Experienced something similar myself. Was even thinking about creating regular expression library which just allow "safe" and fast expression.The trick would be to not allow only expression that can be translated easily to state automate.Good regex: "Phone number [0-9]* "Bad regex: ";Name=.;" as . can also match ";" and it can lead to bad backtracking. You should rewrite this regex to ";Name=[^;];"RE2 is probably best implementation so far, but because it's tries so hard to preserve backward compatibility with all regular expression it is not that fast in average case: <a href="https://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">https://swtch.com/~rsc/regexp/regexp1.html</a>

lazyantalmost 9 years ago

"the entire site became unavailable since the load balancer took the servers out of rotation." I don't care about the regexp, this is bad SRE, you can't just take servers out of rotation without some compensation action.Never mind that it looks like all web servers where taken out of rotation, even one server down could cause a cascading effect (more traffic directed to the healthy ones that end up dying, in a traffic-based failure). One action for example after n servers have gone down, (besides getting up other m servers) is to put (at least some) servers in a more basic mode (read only/static, some features disabled), not guaranteed but that could have prevented this and other type of down times.

评论 #12143470 未加载

评论 #12133845 未加载

shanemhansenalmost 9 years ago

Yesterday I couldn't use hipchat for a couple hours because it would lock up a cpu and fail to load. After doing some free debugging for them I realized they were locking up trying to extract urls out of some text with a regex. Simplified code: <a href="https://gist.github.com/shanemhansen/c4e5580f7d4c6265769b0df61d6d8759" rel="nofollow">https://gist.github.com/shanemhansen/c4e5580f7d4c6265769b0df...</a>Pasting that content into hipchat will probably lock up your browser and webview based clients. Beware.Lesson learned: don't parse user input with a regex.

antoineMoPaalmost 9 years ago

Google cache saved me during these 34 minutes.

评论 #12132712 未加载

onetwotreealmost 9 years ago

It seems like there should be a way to determine whether a regex can be compiled using the classic O(n) DFA algorithm or with whatever madness PCREs use to support backtracking and so on.Anybody know if any regex engines attempt this?Obviously you can still shoot yourself in the foot, but it's somewhat more difficult to do so in a situation like this where the regex in question "looks" cheap.

评论 #12132847 未加载

评论 #12133746 未加载

评论 #12132582 未加载

rixedalmost 9 years ago

Regex was not the main issue. The main issues were:1. Rendering a page fails/does not terminate if some non essential subtask (rendering a single code block) fails/does not terminate.2. They do not try to detect bad data (the way they certainly try to detect bad code)3. Load balancing based on the rendering time of a single pageCode bugs triggered by bad data will happen again, with or without regular expressions.

评论 #12177786 未加载

laurenceialmost 9 years ago

Could this has been a deliberate/malicious act? Why else would someone post 20,000 consecutive characters of whitespace on a comment line?Also, the "homepage" of StackOverflow does not show any 'comments' - it is just the top questions? Why was the page loading any comments in the first place?

评论 #12132076 未加载

评论 #12132088 未加载

评论 #12132064 未加载

评论 #12132017 未加载

animexalmost 9 years ago

We had a similar issue arising from regex parsing of our SES routes on our SaaS Platform. We had made some changes to our generated SES file which caused it to balloon to 4x in size (tens of thousands of lines). Our only clue that something had gone wrong was suddenly extremely high IIS usage. With some help from Microsoft support, we managed to trace the stack during the high-cpu event to an ISAPI filter and ultimately our 3rd party SES plugin. We managed to fix the problem by being more efficient with our regex generation and reduce the number of rules the plugin was processing but it was eye-opening how much CPU was being consumed by regex processing.

Scea91almost 9 years ago

I like this because it shows how important it is to understand the inner workings of the tools in your toolbox. It could serve as a nice example in some 'Languages and Grammars' course at the University for additional motivation.

revelationalmost 9 years ago

They implemented trim with a regex? Neither Java nor .NET do that.The postmortem here should probably be "why are you reimplementing trim".

评论 #12132398 未加载

评论 #12132264 未加载

评论 #12132289 未加载

评论 #12132869 未加载

grashalmalmost 9 years ago

Easy to reproduce [1]. Just remove the a in the end and your timeout disappears. Anybody knows which regex engine they used?[1] <a href="http://regexr.com/3drn3" rel="nofollow">http://regexr.com/3drn3</a>

评论 #12132418 未加载

评论 #12133188 未加载

评论 #12132432 未加载

cypharalmost 9 years ago

I'm still confused why people would use a backtracking regex engine in cases when they don't need recursive regex extensions (or other questionable extensions like back references). A "correct" (from the CS perspective) regex engine wouldn't have had this or many other problems that people encounter when doing regular expression matching. If they had piped out to sed or awk this wouldn't have happened, since GNU grep, sed and awk use a proper regex engine.

oztenalmost 9 years ago

My blog post[1] on how to test for catastrophic backtracking using RegEx buddy.[1] <a href="https://blog.mozilla.org/webdev/2010/11/15/avoiding-catastrophic-backtracking-in-apache-rewriterule-patterns/" rel="nofollow">https://blog.mozilla.org/webdev/2010/11/15/avoiding-catastro...</a>

adrianratnapalaalmost 9 years ago

Backtracking regexes matchers are a Bad Idea.It's true you need them to implement backreferences. But I've never used such a thing. If I were creating a runtime for some new language, I would simply ignore that part of the POSIX standard.

davidronalmost 9 years ago

The whole postmortem focuses on a regular expression bug and how that bug was fixes and completely ignores the fact that if the home page becomes unavailable, the load balancer logic will shut down the entire site.

评论 #12177804 未加载

wfunctionalmost 9 years ago

I still haven't figured out why regex engines font use state machines where possible (i.e. in the absence of back references and such). Is that not an obvious optimization?

johnwheeleralmost 9 years ago

ugh. i would've just sat there wondering WTF. then proceed to initiate daily backup recovery.

评论 #12132410 未加载

OJFordalmost 9 years ago

<pre><code> > It took 10 minutes to identify the cause, </code></pre> Impressive, considering:<pre><code> > cause was a malformed post that caused one of our > regular expressions to consume high CPU ... called on > each home page view ... Since the home page is what our > load balancer uses for the health check, the entire site > became unavailable since the load balancer took the > servers out of rotation.</code></pre>

unethical_banalmost 9 years ago

Not understanding why backtracking happened. Once it hit a non space, non end character, move on. Nothing before can match the regex.

评论 #12132168 未加载

zzzcpanalmost 9 years ago

Seems like there is still no better way to deal with these kind of mistakes than preemptive Erlang-style lightweight processes.

NetStrikeForcealmost 9 years ago

Someone wiser than me said once that if you have a problem and want to fix it with a regex then you now have two problems :-)

rochoalmost 9 years ago

By the way, this is the post that broke StackOverflow: <a href="http://stackoverflow.com/questions/38484433/join-tiles-in-corona-sdk-into-one-word-for-a-breakout-game-grid" rel="nofollow">http://stackoverflow.com/questions/38484433/join-tiles-in-co...</a>

ozimalmost 9 years ago

For me awesome part is cliché that is quite popular on SO takes down SO. And resolution to replace RegExp with substring completes the picture. Just cannot stop laughing."Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

porjoalmost 9 years ago

<pre><code> > 20,000+19,999+19,998+…+3+2+1 = 199,990,000</code></pre> = 200,010,000, not that anyone's counting :)

bshimminalmost 9 years ago

In reading this post, I realised this was the first time I'd ever visited the Stack Overflow homepage.

random3almost 9 years ago

> Add controls to our load balancer to disable the healthcheck – as we believe everything but the home page would have been accessible if it wasn’t for the the health checkWouldn't regular users, trying to access the homepage have yielded the same effect?

babuskovalmost 9 years ago

> This regular expression has been replaced with a substring function.I always cringe when I see regex used for such simple string checks. In fact, Stackoverflow is full of accepted answers that "solve" problems that way.

jimjimjimalmost 9 years ago

paging jwz. something something two problems.

JBiserkovalmost 9 years ago

The Stack status page contains 3 script tags before the HTML tag.This is what I saw on my Kindle 3 Keyboard:This page contains the following errors:error on line 2 at column 36: Extra content at the end of the documentBelow is a rendering of the page up to the first error.var __pbpa = true;

_RPMalmost 9 years ago

They have limits on everything (comments per second, edits per second, upvotes per day, reputation earned per day, etc), it seems like they should have an upper bound character limit on what they accept too.

评论 #12132488 未加载

评论 #12133268 未加载

Retr0spectrumalmost 9 years ago

For more bugs caused by quadratic complexity:<a href="http://accidentallyquadratic.tumblr.com/" rel="nofollow">http://accidentallyquadratic.tumblr.com/</a>

jngalmost 9 years ago

Any more proof needed that caching should become a system-provided service over the next 10-20 years, the same way memory management did in the past 10-20 years?

berkutalmost 9 years ago

If it was in a comment, why was the home page loading it?preemptive caching?

mtokunagaalmost 9 years ago

" This regular expression has been replaced with a substring function." I came to rely on Regex so much that I almost feel we'd be the next.

zkhaliquealmost 9 years ago

This is great. I just want to add something that might not be well-known: StackOverflow is all hosted from ONE web app server! It handles all the writes.

perceptalmost 9 years ago

Productivity plummets worldwide (regex attack vector)

brokencubealmost 9 years ago

Correct me if I'm wrong, but couldn't this could have been fixed by making the match possessive:^[\s\u200c]++|[\s\u200c]++$That should stop any runaway backtracking?

hamzalivealmost 9 years ago

200010000 not 199990000 probably the author looped on a 0-based index. n*(n+1)/2 is even better ^^ Nice post mortem though

estrabdalmost 9 years ago

TIL what language Stack Overflow is written in.

评论 #12133850 未加载

stop1234almost 9 years ago

Yes, one of the best postmortems, especially the technical part.Am sure it was simple but curious to know what the replacement substr code is.

GnarlyWhalealmost 9 years ago

Favourite comment from the Reddit thread on the matter:"Well, that should stave off the imposer syndrome for another couple of days."-u/minno

rmdossalmost 9 years ago

Very interesting bug. People forget some times how expensive a regex can be compared to simple pattern matching.

Osirisalmost 9 years ago

Why isn't the trim applied when the post is created and not every time that it's displayed?

MalcolmDiggsalmost 9 years ago

Regex: ruining your life since 1956.

Waterluvianalmost 9 years ago

I want to believe that a cat fell asleep on the space bar. Then eventually woke up and posted.

davidwparkeralmost 9 years ago

This is great- regex errors always reminds me of this classic Jeff Atwood post (cofounder of StackOverflow): <a href="https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/" rel="nofollow">https://blog.codinghorror.com/regular-expressions-now-you-ha...</a>

dear1777almost 9 years ago

Hmmn, if it was one request, how did it cause other web servers in the farm go down?

rosstexalmost 9 years ago

Wow, I didn't notice today! I must not have been coding very much.

hstunalmost 9 years ago

But... how did they search for a fix without resorting to Stack Overflow? :)

smegelalmost 9 years ago

> If the string to be matched against contains 20,000 space characters in a row, but not at the end, then the Regex engine will start at the first space, check that it belongs to the \s character class, move to the second space, make the same check, etc. After the 20,000th space, there is a different character, but the Regex engine expected a space or the end of the string. Realizing it cannot match like this it backtracks, and tries matching \s+$ starting from the second space, checking 19,999 characters. The match fails again, and it backtracks to start at the third space, etc.That's not how backtracking works. A regex engine will only backtrack to try and make the rest of the regex match, i.e. it will take characters of the RHS of the string, not try and start "from the second character off the start of the string". I mean, if the engine tried matching from the second space, what would be matching the first space? Something has to.Which meant, that even if the regex engine was incredibly stupid and could not figure out that a greedy block of \s was never going to contain a non-\s, it would only have to check 20,001 times, not 199000 (or whatever it was).I can't reproduce this "bug" in either Perl or Python. The time taken to match a 30,000 block of space either followed by $ or XX$ was basically identical for \s+$.There does appear to be normal backtracking going on, roughly doubling the search time for large strings terminating in non-\s. This is expected, as it has to check 20,000 during the first gobble, then 20,000 as it backtracks from the right 20,000 times.<pre><code> $ time perl -e '(" " x 100000000 . "X") =~ /\s+$/ && print "MATCH"' real 0m0.604s user 0m0.509s sys 0m0.094s $ time perl -e '(" " x 100000000) =~ /\s+$/ && print "MATCH"' MATCH real 0m0.286s user 0m0.197s sys 0m0.089s</code></pre>

评论 #12132470 未加载

评论 #12132950 未加载

评论 #12136530 未加载

评论 #12135952 未加载

评论 #12132618 未加载

monochromaticalmost 9 years ago

> So the Regex engine has to perform a “character belongs to a certain character class” check (plus some additional things) 20,000+19,999+19,998+…+3+2+1 = 199,990,000 times, and that takes a while.199,990,000 isn't really all that many. I'm a little surprised it didn't just cause a momentary blip in performance.edit: whoops, i guess that's per page load

评论 #12132090 未加载

评论 #12132135 未加载

评论 #12132063 未加载

评论 #12132099 未加载

评论 #12132187 未加载

评论 #12132089 未加载

评论 #12132071 未加载

fweespeechalmost 9 years ago

The lesson seems to be "Always run trim() before running regex" and "validate content as much as possible before running regex".

评论 #12132186 未加载

评论 #12132194 未加载

yeukhonalmost 9 years ago

This seems like a hard-to-expect edge case for real. I think catching edge case is needed (means more rigorous testing). This is the equivalence of algorithm complexity analysis. How bad can my algorithm be? But regular expression, to be honest, is usually something I hardly think about performance. I don't know about others, but most of the my input are small enough. How big of an input should I test? If I were to deal with a lot of characters, I would be doing substring replacement.

评论 #12132123 未加载

avaralmost 9 years ago

My rephrasing of their follow-up actions:* "Audit our regular expressions and post validation workflow for any similar issues"* ==> "Not even people who've worked for years on the guts of regex engines can easily predict the runtime of a given regex, but somehow our engineers will be expected to do that".* "Add controls to our load balancer to disable the healthcheck – as we believe everything but the home page would have been accessible if it wasn’t for the the health check"* ==> "Our lb check was checking /index, that failed because /index was slow: Lesson learned, let's not lb check anything at all"

评论 #12132786 未加载

评论 #12132676 未加载

评论 #12133164 未加载