TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Xz format inadequate for long-term archiving (2016)

92 pointsby UkiahSmithalmost 6 years ago

12 comments

jordighalmost 6 years ago
This guy used to go around GNU mailing lists (and others) trying to get us to use lzip.<p><a href="https:&#x2F;&#x2F;gcc.gnu.org&#x2F;ml&#x2F;gcc&#x2F;2017-06&#x2F;msg00044.html" rel="nofollow">https:&#x2F;&#x2F;gcc.gnu.org&#x2F;ml&#x2F;gcc&#x2F;2017-06&#x2F;msg00044.html</a><p><a href="https:&#x2F;&#x2F;lists.debian.org&#x2F;debian-devel&#x2F;2017&#x2F;06&#x2F;msg00433.html" rel="nofollow">https:&#x2F;&#x2F;lists.debian.org&#x2F;debian-devel&#x2F;2017&#x2F;06&#x2F;msg00433.html</a><p>It was a bit bizarre when he hit the Octave mailing list.<p>Eventually, people just wanted xz back:<p><a href="http:&#x2F;&#x2F;octave.1599824.n4.nabble.com&#x2F;opinion-bring-back-Octave-xz-source-release-td4683705.html" rel="nofollow">http:&#x2F;&#x2F;octave.1599824.n4.nabble.com&#x2F;opinion-bring-back-Octav...</a>
评论 #20106511 未加载
brianpgordonalmost 6 years ago
Previous discussions:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12768425" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12768425</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16884832" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16884832</a>
esaymalmost 6 years ago
Interestingly, since &quot;recovery&quot; is mentioned several times, I decided to test myself.<p>I took a copy of a jpeg image, compressed it different times with either gzip or bzip2, then with a hexeditor modified one byte.<p>The recovery instructions for gzip is to simply do &quot;zcat corrupt_file.gz &gt; corrupt_file&quot;. While for bzip2 is to use the bzip2recover command which just dumps the blocks out individually (corrupt ones and all).<p>Uncompressing the corrupt gzip jpeg file via zcat at all times resulted in an image file the same size as the original and could be opened with any image viewer although the colors were clearly off.<p>I never could recover the image compressed with bzip2. Trying to extract all the recovered blocks made by bzip2recover via bzcat would just choke on the single corrupted block. And the smallest you can make a block is 100K (vs 32K for gzip?). Obviously pulling 100K out of a jpeg will not work.<p>Though I&#x27;m still confused as to how the corrupted gzip file extracted to a file of the same size as the original. I guess gzip writes out the corrupted data as well instead of choking on it? I guess gzip is the winner here. Having a file with a corrupted byte is much better than having a file with 100K of data missing...
评论 #20106467 未加载
评论 #20106393 未加载
xoaalmost 6 years ago
Not that many of the complaints aren&#x27;t reasonable, but I thought that in general compression&#x2F;format was orthogonal to parity, which is what I assume is actually wanted for long-term archiving? I always figured that the goal should normally to be able to get back out a bit-perfect copy of whatever went in, using something like Parchive at the file level or ZFS for online storage at the fs level. I guess on the principle of layers and graceful failure modes it&#x27;s better if even sub-archives can handle some level of corruption without total failure, and from a long term perspective of implementation independence simpler&#x2F;better specified is preferable, but that still doesn&#x27;t seem to substitute for just having enough parity built in to both notice corruption and fully recover from it to fairly extreme levels.
评论 #20106607 未加载
Adamantcheesealmost 6 years ago
How about something like ZPAQ instead for archiving? Especially if you&#x27;re doing backups and not a lot of the information is changing.
ltbarcly3almost 6 years ago
No file format is perfect, I&#x27;ve been using xz for years and I can&#x27;t think of a single issue I have had. The compression rate is dramatically better than gzip or bzip2 for many types of archives (especially when there is a large redundancy, for example when compressing spidered web pages from the same site you can get well over 99% size reduction compared to 70% reduction for gzip, which means using less than one 30th of the disk space).<p>Lately I have been using zstd for some things since it gives good compression and is much faster than xz.<p>This criticism of xz just seems nit picky and impractical, especially if you are compressing tar archives and&#x2F;or storing the archives on some kind of raid which can correct some read errors (such as raid5).
asveikaualmost 6 years ago
I remember seeing this article before. This time the reaction that surges for me is: if you want long-term archiving but don&#x27;t assume redundant storage, it&#x27;s not going to go well. Put your long-term archives on ZFS.
评论 #20105821 未加载
mkjalmost 6 years ago
A bit of speculation here, but perhaps xz won over lzip because it has a real manpage?<p>lzip has the usual infuriating short summary of options with a &quot;run info lzip for the complete manual&quot;. Also the source code repository doesn&#x27;t even seem linked directly from the lzip homepage - technical considerations aren&#x27;t the only thing that determines if software is &quot;better&quot;, it also has to be well presented.
shmerlalmost 6 years ago
xz-utils should implement parallel decompression already. pixz is doing it, but stock xz is not. Most end users benefit from faster decompression.
SEJeffalmost 6 years ago
This should have (2016) in the title.
评论 #20105327 未加载
LinuxBenderalmost 6 years ago
If you first use tar to preserve xattrs&#x2F;etc.. then you can use anything to compress. xz, bz2, 7z, even arj if you are feeling nostalgic.<p><pre><code> tar cvfJ .&#x2F;files.tar.xz &#x2F;some&#x2F;dir</code></pre>
评论 #20106268 未加载
microcolonelalmost 6 years ago
&gt; <i>&quot;3 Then, why some free software projects use xz?&quot;</i><p>Because the files are usually smaller than gzip, with faster decompression than bzip2, and the library is available on most systems.
评论 #20106906 未加载