TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Retry XMLHttpRequest Carefully

27 pointsby aparks517almost 3 years ago

6 comments

joshstrangealmost 3 years ago
This post reminded of a &#x2F;fun&#x2F; bug I ran into at my last job. The root of the issue was our server was returning a 408 timeout error when something timed out on the backend. Astute readers might immediately notice an issue with that code, mainly that it&#x27;s for a &#x2F;client&#x2F; timeout not 504 (server timeout).<p>Since we controlled the client and server you might think it doesn&#x27;t matter much, as long as the client knows how to handle a 408 everything should be fine right? Well not exactly. We had a number of pages&#x2F;endpoints that would be overwhelmed when there was too much data on the backend for a given user&#x2F;company (which was it&#x27;s own issue but let&#x27;s just take that as fact for this and move on). We dutifully sent back 408&#x27;s and knew that we needed to optimize or chunk those endpoints in the future.<p>The problem was the code that timed out would keep running until it finished (or hit a different timeout). On it&#x27;s own this isn&#x27;t the end of the world, the server does work that never gets sent to the client and essentially is thrown away. The problem was some of these endpoints would run very heavy queries that could bring down our database if enough of them were run in a short window. Even more confusing it appeared as if our server was running multiple of these queries for only 1 request.<p>We were using Akka (Actor-based) in our backend and thought maybe something was misconfigured or that we might accidentally be dropping duplicate messages into the queues that caused multiple queries to get fired off. We fought with this on&#x2F;off for months (you know, something else is always a higher priority). Finally we tracked down that the browser was sending multiple requests. This was only clear when testing in a isolated dev environment with no other traffic and you could clearly see a second request being made after the first one timed out. What was extremely frustrating is Chrome did not show this second request in the dev tools, it only showed the initial request.<p>After more digging and a little luck I stumbled across a &quot;feature&quot; of browsers where they will retry the same request under certain circumstance and they wouldn&#x27;t show that in dev tool. A 408 was one of those cases. Switching to the correct code, 504, immediately fixed our odd self-DDOS against our DB.<p>Obviously this was the fault of whoever initially defined `TIMEOUT = 408` somewhere in our HTTP error codes class but to this day I feel like Chrome should have had some indication that it was firing off another request. If you left the tab open Chrome would just keep retrying and slowly overwhelm the DB with heavy queries until it fell over.
评论 #31912243 未加载
评论 #31918587 未加载
评论 #31913093 未加载
e12ealmost 3 years ago
Very nice write-up. I&#x27;d be curious to see it using fetch rather than the older xhr Api - that would make more sense as a library <i>today</i> I think? Or are there compelling reasons to stick with xhr in a post-ie world?
评论 #31912102 未加载
评论 #31915246 未加载
评论 #31913839 未加载
cratermoonalmost 3 years ago
This is a good introduction to the subject. At the end you might want to mention rate limiting, truncation (giving up after a specified number of retries) and, for the most sophistication, circuit breakers.
评论 #31911147 未加载
noduermealmost 3 years ago
At this point I like to wrap my remoting code (including optional retries) into async functions that return a promise. The promise resolves with a result from the server, or rejects with any sort of error that can come from an XHR (or now, fetch), and the consumer can decide what to do with that.<p>In my software I almost never do automatic retries, rather I warn the user that there was a connection error if necessary, and revert whatever state was waiting on a call. In my view anything that can fail silently probably does not need to be retried right now.
kerblangalmost 3 years ago
Am I wrong or is this going to retry 404&#x27;s and other 4xx things that it shouldn&#x27;t retry?
rbirkbyalmost 3 years ago
Just use axios and <a href="https:&#x2F;&#x2F;www.npmjs.com&#x2F;package&#x2F;retry-axios" rel="nofollow">https:&#x2F;&#x2F;www.npmjs.com&#x2F;package&#x2F;retry-axios</a>
评论 #31913197 未加载