TechEcho

11 comments

I'm surprised rateless fountain codes aren't mentioned! If you enjoy this sort of thing, you'll find the Luby Transform Code fascinating: <a href="https://en.wikipedia.org/wiki/Luby_transform_code" rel="nofollow">https://en.wikipedia.org/wiki/Luby_transform_code</a>This paper is a really nice overview with more detail: <a href="https://switzernet.com/people/emin-gabrielyan/060112-capillary-references/ref/MacKay05.pdf" rel="nofollow">https://switzernet.com/people/emin-gabrielyan/060112-capilla...</a>LT codes are used as the "outer code" in the linear time RaptorQ encoding specified in RFC6330: <a href="https://www.rfc-editor.org/rfc/rfc6330" rel="nofollow">https://www.rfc-editor.org/rfc/rfc6330</a>

评论 #41365185 未加载

评论 #41365232 未加载

评论 #41364473 未加载

hinkley9 months ago

Years back someone proposed a cute algorithm for erasure codes that depended not on spinning rust but on multipath networking.I believe they called it network coding and the idea was in a network with multiple routes I might get a file faster by pulling an erasure code that used two parts of the file or even two files from one upstream instead of waiting for the entire file from the primary server.

评论 #41363714 未加载

评论 #41363214 未加载

评论 #41363536 未加载

评论 #41363339 未加载

otterley9 months ago

Erasure coding has been around for a very long time. Remember PAR2 files on Usenet? <a href="https://en.wikipedia.org/wiki/Parchive" rel="nofollow">https://en.wikipedia.org/wiki/Parchive</a>

评论 #41365009 未加载

评论 #41363154 未加载

评论 #41364409 未加载

donavanm9 months ago

If youre interested in EC you might want to consider larger multi dimensional cases. Think of encoding not just across spindles, but another failure domain like rack, room, DC, or region. The goal being to tolerate common component failures, and larger system failures (or partitions) as well. A nice intro <a href="https://chameleoncloud.org/blog/2023/12/12/design-considerations-and-analysis-of-multi-level-erasure-coding-in-large-scale-data-centers/" rel="nofollow">https://chameleoncloud.org/blog/2023/12/12/design-considerat...</a>

评论 #41364923 未加载

wahern9 months ago

Has anybody used Wirehair in a project? <a href="https://github.com/catid/wirehair">https://github.com/catid/wirehair</a>I'm curious if it's well-defined enough to base a standard around--informally if not formally--for building a large file archiving/data recovery project I've been mulling over for nearly 10 years. It's the only large block erasure code I've found that has both the ideal (or nearly ideal) algorithmic performance and API. That makes it a nice blackbox for my use case, unlike something like RaptorQ, which leaks little details all over the place, driving up the complexity and rigidness of the rest of the stack. But Wirehair isn't a spec, it's an (experimental?) implementation of an idea. It seems stable, but unless/until I try writing a second implementation, or it's seen substantial use (exposing any sharp edges in the algorithm) I worry how easily it would translate to a reliable specification or second implementation.

评论 #41363192 未加载

stevefan19999 months ago

Yep, this is the key tech behind Ceph's Erasure Code pool: <a href="https://docs.ceph.com/en/latest/rados/operations/erasure-code/" rel="nofollow">https://docs.ceph.com/en/latest/rados/operations/erasure-cod...</a>This does not come without trade-offs though, you cannot update the coding parameters (k, m) afterwards, so you either have to be very sure that those parameters are going to work in a long time, or you have to start from scratch. This inelasticity is also the reason why replicas are still the dominant choice for HA fault tolerant data storage.

评论 #41364738 未加载

评论 #41364373 未加载

ggm9 months ago

Am I right in thinking that products made during an M of N incident are coded differently to when all N are available? If so, you might want a bitflag to denote "needs to be re-encoded when the N is restored" or else you have some files with less than stellar recovery for a random loss in the N set.

评论 #41363494 未加载

akww9 months ago

Reminds me also of Rabin’s Information Dispersal Algorithm, described in a paper here:<a href="https://dl.acm.org/doi/10.1145/62044.62050" rel="nofollow">https://dl.acm.org/doi/10.1145/62044.62050</a>

anonymousDan9 months ago

Is it the case that it's really only practical for read-only or very read intensive workloads?

from-nibly9 months ago

This is one of the replication strategies that ceph uses for its distributed blob stores

nullc9 months ago

unfortunately modern cpus are still pretty sparse on tools to make erasure codes extremely fast. E.g. no vector clmuls.

评论 #41365551 未加载

11 comments

jcalvinowens9 months ago

评论 #41365185 未加载

评论 #41365232 未加载

评论 #41364473 未加载

hinkley9 months ago

评论 #41363714 未加载

评论 #41363214 未加载

评论 #41363536 未加载

评论 #41363339 未加载

otterley9 months ago

Erasure coding has been around for a very long time. Remember PAR2 files on Usenet? <a href="https://en.wikipedia.org/wiki/Parchive" rel="nofollow">https://en.wikipedia.org/wiki/Parchive</a>

评论 #41365009 未加载

评论 #41363154 未加载

评论 #41364409 未加载

donavanm9 months ago

评论 #41364923 未加载

wahern9 months ago

评论 #41363192 未加载

stevefan19999 months ago

评论 #41364738 未加载

评论 #41364373 未加载

ggm9 months ago

评论 #41363494 未加载

akww9 months ago

anonymousDan9 months ago

Is it the case that it's really only practical for read-only or very read intensive workloads?

from-nibly9 months ago

This is one of the replication strategies that ceph uses for its distributed blob stores

nullc9 months ago

unfortunately modern cpus are still pretty sparse on tools to make erasure codes extremely fast. E.g. no vector clmuls.

评论 #41365551 未加载

Erasure Coding for Distributed Systems

11 comments

Erasure Coding for Distributed Systems

11 comments