This is how many HTTP caches and CDNs work. The terminology used to describe it is often request collapsing or request coalescing.<p>Some examples:<p>* varnish: <a href="https://info.varnish-software.com/blog/hit-for-pass-varnish-cache" rel="nofollow">https://info.varnish-software.com/blog/hit-for-pass-varnish-...</a><p>* nginx: <a href="http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock" rel="nofollow">http://nginx.org/en/docs/http/ngx_http_proxy_module.html#pro...</a><p>* fastly: <a href="https://docs.fastly.com/guides/performance-tuning/request-collapsing" rel="nofollow">https://docs.fastly.com/guides/performance-tuning/request-co...</a><p>* cloudfront: <a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-traffic-spikes" rel="nofollow">https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...</a>
In practice there's a bit more to it than what the article describes, especially for a distributed cache: you need to have the Promise auto-expire so that if the machine that's performing the backend read disappears the other readers don't stay stuck forever waiting for it. It's also useful to have a feature in the cache itself that blocks the caller until the Promise has been fulfilled, so as to avoid repeated requests asking if the data is finally there.<p>As an aside, Guava loading caches[1] implement per-key locking so that multiple threads accessing a missing key simultaneously[2] would only lead to a single load with all other readers waiting for the loading thread to complete the fetch and populate the cache.<p>[1] <a href="https://github.com/google/guava/wiki/CachesExplained" rel="nofollow">https://github.com/google/guava/wiki/CachesExplained</a><p>[2] in the sense of "in the time it takes for the first accessor to complete the backend read"
This is how all async caches are supposed to work. You never want concurrent requests for the same uncached resource to all hit the backend.<p>For TypeScripters out there, this is what my team wrote for our static analysis framework: <a href="https://github.com/Polymer/tools/blob/master/packages/analyzer/src/core/async-work-cache.ts" rel="nofollow">https://github.com/Polymer/tools/blob/master/packages/analyz...</a>
Instead – and more usefully given how slow some of our backend APIs are – we cache each value twice, under e.g. `key` with a short TTL and `key_backup` with a long TTL.<p>The first process to miss on `key` renames `key_backup` to `key` (which is atomic and fast on Redis) and goes to the backend for a new value to cache twice and return, while the rest of the herd reads the renamed backup.<p>Yes, this doubles the total cache size, or equivalently halves the number of keys we have room for. That's a price we're OK with paying to avoid blocking reads while a value is recalculated.
This looks like it would be a very useful design, however the article doesn't discuss the implementation of the Promise.<p>This is not something memcached or redis support out of the box, as far as I know. It would seem to imply a cache manager service that has its own in-memory table of Promises.
"instead of caching the actual value, we cached a Promise that will eventually provide the value"<p>I did this exact thing recently in a client-side HTTP caching system for frequently-duplicated API requests from within a single page. Cool to see it pop up elsewhere.
Back when I worked on a similar problem 10 years ago, we solved it by having a quickly expiring memcached key for hitting the database. So if the value wasn't cached, and if that key wasn't there, it would attempt to add that key. If it was added, it would hit the database and cache the result. Otherwise, if that key was there or it didn't successfully add it, it would wait for a short period of time, then re-try the whole process again.<p>There's other similar problems elsewhere too though. A cold MySQL start is a bitch when you have and rely upon huge amounts of memory for MySQL to service your requests - this is especially noticeable if you have so much traffic you need to cache some results. Back then it would take us about an hour before a freshly spun up MySQL instance could keep up with our regular traffic, even accounting for stuff being cached.
Came up with something along these lines at my last place - not actually the hardest thing to do as long as you've got a robust & convenient locking system available to you. In my case I abused the common db instance that all the clients were connected to anyway to synchronize the callers on postgres advisory locks. Sure, this isn't the infinitely scalable, Netflix-scale solution that everyone is convinced they need for everything, but it <i>will</i> probably work <i>absolutely fine</i> for >90% of development scenarios.<p><a href="https://gist.github.com/risicle/f4807bd706c9862f69aa" rel="nofollow">https://gist.github.com/risicle/f4807bd706c9862f69aa</a>
We do the same thing on client side when lazy-loading assets (typically textures) from the disk, cause you don’t want to hold multiple copies of the same asset in memory.
I'm not a developer, but to be honest, thought that's how all non-trivial caching implementations worked -- instead of going directly to the back end or having each one trigger a read from the back end for a cache-miss, all of the threads that want that resource just waited on it to appear in the cache.
Interesting. I also ran into this on a much smaller scale 3 years ago and made <a href="https://github.com/AriaFallah/weak-memoize" rel="nofollow">https://github.com/AriaFallah/weak-memoize</a>
i have implemented this in node.js <a href="https://npmjs.com/package/memoise" rel="nofollow">https://npmjs.com/package/memoise</a>
In an old apartment I had a lot of stuff on the same extension plug. I had to turn on the devices one by one to prevent power loss ... There's also the same problem/solution when boarding airplanes. Even if it feels backwards, it's actually faster to let one third of the requests go through first, then to let all requests go through at once.