TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Software Deduplication: Quick comparison of save ratings

2 pointsby linuxreadyover 9 years ago

1 comment

vardumpover 9 years ago
&gt; Method: I used 22.3GiB worth of Windows XP installation ISOs, 52 ISOs in total. No file was exactly the same, but some contained much duplicate data, like the Swedish XP Home Edition vs the Swedish N-version of XP Home Edition. I deduplicated these files and noted how much space I saved compared to the 22.3GiB.<p>So let me get it straight: he stores a bunch of CD ISOs, presumably with block size of 2048 bytes to different dedup file systems without caring about dedup block size?<p>ZFS has 128 kB recordsize by default, so little wonder it does so badly <i>in this particular test</i> without any tuning!<p>Windows has 4 kB blocks, so that&#x27;s why it does so well. Doh.<p>He could have configured other systems to use a different block size. 2 kB block would obviously be optimal, one should get the highest deduplication savings with that size.<p>From ZFS documentation: <a href="http:&#x2F;&#x2F;open-zfs.org&#x2F;wiki&#x2F;Performance_tuning#Dataset_recordsize" rel="nofollow">http:&#x2F;&#x2F;open-zfs.org&#x2F;wiki&#x2F;Performance_tuning#Dataset_recordsi...</a><p>&quot;ZFS datasets use an internal recordsize of 128KB by default. The dataset recordsize is the basic unit of data used for internal copy-on-write on files. Partial record writes require that data be read from either ARC (cheap) or disk (expensive). recordsize can be set to any power of 2 from 512 bytes to 128 kilobytes. Software that writes in fixed record sizes (e.g. databases) will benefit from the use of a matching recordsize.&quot;<p>So what happens if he sets ZFS recordsize to 2 kB (assuming it can be done?)? Ok, dedup table will probably be huge, but... savings ratio is what we need to know.<p>&gt; ZFS is another filesystem capable of deduplication, but this one does it in-line and no additional software is required.<p>Yup, ZFS is probably the best choice for online deduplication.
评论 #11077232 未加载