I'm surprised rateless fountain codes aren't mentioned! If you enjoy this sort of thing, you'll find the Luby Transform Code fascinating: <a href="https://en.wikipedia.org/wiki/Luby_transform_code" rel="nofollow">https://en.wikipedia.org/wiki/Luby_transform_code</a><p>This paper is a really nice overview with more detail: <a href="https://switzernet.com/people/emin-gabrielyan/060112-capillary-references/ref/MacKay05.pdf" rel="nofollow">https://switzernet.com/people/emin-gabrielyan/060112-capilla...</a><p>LT codes are used as the "outer code" in the linear time RaptorQ encoding specified in RFC6330: <a href="https://www.rfc-editor.org/rfc/rfc6330" rel="nofollow">https://www.rfc-editor.org/rfc/rfc6330</a>
Years back someone proposed a cute algorithm for erasure codes that depended not on spinning rust but on multipath networking.<p>I believe they called it network coding and the idea was in a network with multiple routes I might get a file faster by pulling an erasure code that used two parts of the file or even two files from one upstream instead of waiting for the entire file from the primary server.
Erasure coding has been around for a very long time. Remember PAR2 files on Usenet? <a href="https://en.wikipedia.org/wiki/Parchive" rel="nofollow">https://en.wikipedia.org/wiki/Parchive</a>
If youre interested in EC you might want to consider larger multi dimensional cases. Think of encoding not just across spindles, but another failure domain like rack, room, DC, or region. The goal being to tolerate common component failures, and larger system failures (or partitions) as well. A nice intro <a href="https://chameleoncloud.org/blog/2023/12/12/design-considerations-and-analysis-of-multi-level-erasure-coding-in-large-scale-data-centers/" rel="nofollow">https://chameleoncloud.org/blog/2023/12/12/design-considerat...</a>
Has anybody used Wirehair in a project? <a href="https://github.com/catid/wirehair">https://github.com/catid/wirehair</a><p>I'm curious if it's well-defined enough to base a standard around--informally if not formally--for building a large file archiving/data recovery project I've been mulling over for nearly 10 years. It's the only large block erasure code I've found that has both the ideal (or nearly ideal) algorithmic performance <i>and</i> API. That makes it a nice blackbox for my use case, unlike something like RaptorQ, which leaks little details all over the place, driving up the complexity and rigidness of the rest of the stack. But Wirehair isn't a spec, it's an (experimental?) implementation of an idea. It seems stable, but unless/until I try writing a second implementation, or it's seen substantial use (exposing any sharp edges in the algorithm) I worry how easily it would translate to a reliable specification or second implementation.
Yep, this is the key tech behind Ceph's Erasure Code pool: <a href="https://docs.ceph.com/en/latest/rados/operations/erasure-code/" rel="nofollow">https://docs.ceph.com/en/latest/rados/operations/erasure-cod...</a><p>This does not come without trade-offs though, you cannot update the coding parameters (k, m) afterwards, so you either have to be very sure that those parameters are going to work in a long time, or you have to start from scratch. This inelasticity is also the reason why replicas are still the dominant choice for HA fault tolerant data storage.
Am I right in thinking that products made during an M of N incident are coded differently to when all N are available? If so, you might want a bitflag to denote "needs to be re-encoded when the N is restored" or else you have some files with less than stellar recovery for a random loss in the N set.
Reminds me also of Rabin’s Information Dispersal Algorithm, described in a paper here:<p><a href="https://dl.acm.org/doi/10.1145/62044.62050" rel="nofollow">https://dl.acm.org/doi/10.1145/62044.62050</a>