TechEcho

6 comments

lucb1eabout 2 years ago

> corruptions are detected thanks to the checksumThat's good to know because, for important things, I test the archive before throwing the original data away.Not to single out zstd but it's a good opportunity to be reminded: if you make backups, test your backups! A bug like this can also be introduced, not only fixed. I'm not saying it's likely, or that you're likely to be affected by it, but for things you care about, just opening the backup once a year and spot checking some files is not a lot of work for a lot lower risk compared to never checking it. Most backup tools will also let you do checksum validation in an automated fashion, but I prefer to also manually open it and check that it truly works (not that a partial archive has no errors, for example).Anyway, details of the bug are here: <a href="https://github.com/facebook/zstd/pull/3517">https://github.com/facebook/zstd/pull/3517</a>> This [affects] the block splitter. Only levels using the optimal parser (levels 13 and above depending on the source size), or users who explicitly enable the block splitter, are affected.So if you use, for example, zstd -9 (I didn't know it went higher, at least not at somewhat reasonable speeds) or below, then you should always have been fine unless you called the block splitter yourself explicitly (or, perhaps, if your backup software does that for you? It sounds like something relevant for deduplication but I'm not sure what this feature is exactly).> The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block.If I understand it correctly, the bug triggers under circumstances where the data causes the splitter to split at exactly 2^16 and then doesn't flush the block, and one example where it doesn't do that is if any part of the next 2^17 bytes (128K) is compressible? Not sure what a "repeat code sequence" is, my lz77-geared brain thinks of a reference that points to an earlier repetition, aka a compressed part.

评论 #35449724 未加载

评论 #35447231 未加载

pellaabout 2 years ago

Congrats !If anyone is interested in the clearlinux optimized zstd build config ( imho it is useful for the 1.5.5 building )<a href="https://github.com/clearlinux-pkgs/zstd/blob/main/zstd.spec">https://github.com/clearlinux-pkgs/zstd/blob/main/zstd.spec</a>( CFLAGS, 4 patch )

评论 #35447203 未加载

marssaxmanabout 2 years ago

> due to the nb and complexity of simultaneous conditionsAnybody willing to explain what "nb" stands for in this context?

评论 #35450411 未加载

评论 #35450341 未加载

vo_9812about 2 years ago

Congrats to Yann and the team.

azatomabout 2 years ago

I don't know if it's related, but one time I had corruption too. I was playing with large sparse (ntfs) files and used zstd for temporarily store those. It was probably low compression settings and I couldn't reproduce, so I'd thought I messed up something.

评论 #35455905 未加载

infogulchabout 2 years ago

I wonder if this affects ZFS

评论 #35447717 未加载

6 comments

lucb1eabout 2 years ago

评论 #35449724 未加载

评论 #35447231 未加载

pellaabout 2 years ago

评论 #35447203 未加载

marssaxmanabout 2 years ago

> due to the nb and complexity of simultaneous conditionsAnybody willing to explain what "nb" stands for in this context?

ZSTD 1.5.5 is released with a corruption fix found at Google

6 comments

ZSTD 1.5.5 is released with a corruption fix found at Google

6 comments