TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Erasure Coding versus Tail Latency

75 点作者 usrme大约 1 年前

8 条评论

ot大约 1 年前
It is worth noting that this does not come for free, and it would have been nice for the article to mention the trade-off: reconstruction is not cheap on CPU, if you use something like Reed-Solomon.<p>Usually the codes used for erasure coding are in <i>systematic</i> form: there are k &quot;preferential&quot; parts out of M that are just literal fragments of the original blob, so if you get those you can just concatenate them to get the original data. If you get any other k-subset, you need to perform expensive reconstruction.
评论 #39852894 未加载
评论 #39858792 未加载
dmw_ng大约 1 年前
Nice to see this idea written about in detail. I had thought about it in the context of terrible availability bargain bucket storage (iDrive E2), where the cost of (3,2) erasure coding an object and distributing each segment to one of 3 regions would still be dramatically lower than paying for more expensive and more reliable storage.<p>Say 1 chunk lives in Germany, Ireland and the US each. Client races GETs to all 3 regions and cancels the request to the slowest to respond (which may also be down). Final client latency is equivalent to that of the 2nd slowest region, with substantially better availability due to the ability to tolerate any single region being down<p>Still wouldn&#x27;t recommend using E2 for anything important, but ^ was one potential approach to dealing with its terribleness. It still doesn&#x27;t address the reality of when E2 regions go down, it is often for days and reportedly sometimes weeks at a time. So reliable writing in this scenario would necessitate some kind of queue with capacity for weeks of storage<p>There are variants of this scheme where you could potentially balance the horrible reliability storage with some expensive reliable storage as part of the same system, but I never got that far in thinking about how it would work
评论 #39853401 未加载
评论 #39853052 未加载
sujayakar大约 1 年前
this is a really cool idea.<p>one followup I was thinking of is whether this can generalize to queries other than key value point lookups. if I&#x27;m understanding correctly, the article is suggesting to take a key value store, and for every `(key, value)` in the system, split `value` into fragments that are stored on different shards with some `k` of `M` code. then at query time, we can split a query for `key` into `k` subqueries that we send to the relevant shards and reassemble the query results into `value`.<p>so, if we were to do the same business for an ordered map with range queries, we&#x27;d need to find a way to turn a query for `interval: [start, end]` into some number of subqueries that we could send to the different shards and reassemble into the final result. any ideas?
评论 #39853662 未加载
评论 #39853650 未加载
loeg大约 1 年前
Yeah. And you get the storage for free if your distributed design also uses the erasure-encoded chunks for durability. Facebook&#x27;s Warm Storage infrastructure does something very similar to what this article describes.
benlivengood大约 1 年前
The next level of efficiency is using nested erasure codes. The outer code can be across regions&#x2F;zones&#x2F;machines&#x2F;disks while the inner code is across chunks of a stripe. Chunk unavailability is fast to correct with an extra outer chunk and bit rot or corruption can be fixed by the inner code without an extra fetch. In the fast path only data chunks need to be fetched.
评论 #39855482 未加载
siscia大约 1 年前
Nice to see this talked about here and Marc being public about it.<p>AWS is such a big place that even after a bit of tenure you still got place to look to find interesting technical approaches and when I was introduced to this schema for Lambda storage I was surprised.<p>As Marc mentions it is such a simple and powerful idea that is definitely not mentioned enough.
评论 #39852848 未加载
ghusbands大约 1 年前
The first graph is incredibly misleading. The text talks about fetching from 5 servers and needing 4 results vs fetching from 1 server and needing 1 result. Then the graph compares 4-of-5 to 4-of-4 latency, which is just meaningless. It should compare 4-of-5 with 1-of-1.
jeffbee大约 1 年前
I do not follow. How is it possible that the latency is lower in a 4-of-5 read of a coded stripe compared to a 1-of-4 replicated stripe?
评论 #39854908 未加载
评论 #39854047 未加载