TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DwarFS: A fast high compression read-only file system

280 点作者 daantje超过 4 年前

22 条评论

dj_mc_merlin超过 4 年前
&gt; I started working on DwarFS in 2013 and my main use case and major motivation was that I had several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space, and I was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them.<p>It fills me with joy that someone has been coding a fs for 7 years due to perl installs taking too much space. Necessity is the mother of all invention.
评论 #25246472 未加载
评论 #25246591 未加载
评论 #25248282 未加载
评论 #25251897 未加载
fefe23超过 4 年前
It looks like the benefit is some kind of block or file deduplication.<p>@OP: Can you please explain why you keep 50 gigs of perl around? :-)<p>I use compressed read-only file systems all the time to save space on my travel laptop. I have one squashfs for firefox, one for the TeX base install, one for LLVM, one for qemu, one for my cross compiler collection. I suspect the gains over squashfs will be far less pronounced than for the pathological &quot;400 perl version&quot;.
评论 #25247833 未加载
评论 #25250495 未加载
评论 #25247487 未加载
smitty1e超过 4 年前
Whew! It was easy to find out how you actually initialize this thing, if it&#x27;s read-only:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;mhx&#x2F;dwarfs&#x2F;blob&#x2F;main&#x2F;man&#x2F;mkdwarfs.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mhx&#x2F;dwarfs&#x2F;blob&#x2F;main&#x2F;man&#x2F;mkdwarfs.md</a>
slagfart超过 4 年前
Perhaps not strictly on-topic, but is there any equivalent FS&#x2F;program in Windows that will allow users to have read-only access to files that are deduplicated in some way?<p>My use case is the MAME console archives, which are now full of copies of games from different localisations with 99% identical content. 7Z will compress them together and deduplicate, but breaks once the archive exceeds a few gigs.<p>These archives are already compressed (CHD format, which is 7Z + FLAC for ISOs), but it&#x27;s deduplication that needs to happen on top of these already compressed files that I&#x27;m struggling with.<p>Sorry for the off-topic ask!
评论 #25251031 未加载
评论 #25249438 未加载
评论 #25249450 未加载
评论 #25249420 未加载
评论 #25248987 未加载
Scaevolus超过 4 年前
Neat! I&#x27;d like to see benchmarks for more typical squashfs payloads-- embedded root filesystems totalling under 100MB. Small docker images like alpine would be a decent proxy. The given corpus of thousands of perl versions is more appropriate for comparison against git.
评论 #25246369 未加载
david_draco超过 4 年前
I wish there was a semi-compressed transparent filesystem layer which slowly compresses the least recently used files in the background, and un-compresses files upon use. That way you could store much more mostly unused content than space on the disk, without sacrificing accessibility.
评论 #25249400 未加载
评论 #25246314 未加载
评论 #25249341 未加载
评论 #25246657 未加载
评论 #25248414 未加载
hachari超过 4 年前
Why not use BTRFS with file deduplication and transparent compression (zstd specifically)?
评论 #25246787 未加载
评论 #25246808 未加载
stabbles超过 4 年前
mksquashfs supports gzip, xz, lzo, lz4 and zstd too, you can also compile it to have any of those as a default instead of gzip.<p>Does the performance benchmark show DwarFS versus single-threaded gzip compressed SquashFS?
评论 #25247809 未加载
gnosek超过 4 年前
Is this viable as a backup&#x2F;archive format? Would it make sense to e.g. have an incremental backup as a DwarFS file, referring to the base backup in another DwarFS file?
评论 #25247474 未加载
giovannibonetti超过 4 年前
This could be awesome for compressing Docker image layers. After all, they can be huge (hundreds of MB) and, if the Dockerfile is well organized, each step should contain a fairly homogeneous set of files (like apt-get artifacts, for example).
botto超过 4 年前
It would amazing to see this work on OpenWRT, I think it would fit perfectly using less resources than squashfs. The other location would be on a Raspberry pi for scenarios where power can be cut at any time.
评论 #25246529 未加载
评论 #25246511 未加载
jedberg超过 4 年前
Does anyone remember back in the 90s when we&#x27;d install DoubleSpace to get on the fly compression? And then they built it into MSDOS 6 and that was a major game changer?
评论 #25250217 未加载
evantahler超过 4 年前
Oh wow. This would be excellent for language dependencies - ruby gems, node_modules, etc. Integrating this with something like pnpm [1], which already keeps a global store of dependencies would excellent. [1] - <a href="https:&#x2F;&#x2F;pnpm.js.org" rel="nofollow">https:&#x2F;&#x2F;pnpm.js.org</a>
rurban超过 4 年前
So I tried it out on my 17BG of perl builds. (just on my laptop, not on my big machine).<p>mkdwarfs crashed with recursive links (1-level, just pointing to itself) and when I removed dirs while running mkdwarfs, which were part of of the input path. Which is fair, I assume.
评论 #25264236 未加载
评论 #25253708 未加载
ed25519FUUU超过 4 年前
I noticed that enabling compression on zfs made a <i>huge</i> difference with the source size of some of my largely text file petitions. I never turned on deduplication because I don’t want to bother with the memory overhead, but I bet that would help even further.
评论 #25249593 未加载
Twirrim超过 4 年前
I&#x27;m curious, why do you have so many perl installations around. I thought I&#x27;d got a fair number of python venvs kicking around for each of the repos I&#x27;m dealing with, but nowhere near that many.
评论 #25247046 未加载
st_goliath超过 4 年前
Circa 2 years ago, I was working on a side project and got so annoyed with SquashFS tooling, that I decided to fix it instead. After getting stuck with the spaghetti code behind mksquashfs, I decided to start from scratch, having learnt enough about SquashFS to roughly understand the on-disk format.<p>Because squashfs-tools seemed pretty unmaintained in late 2018 (no activity on the official site &amp; git tree for years and only one mailing list post &quot;can you do a release?&quot; which got a very annoyed response) I released my tooling as &quot;squashfs-tools-ng&quot; and it is currently packaged by a hand full of distros, including Debian &amp; Ubuntu.[1]<p>I also thoroughly documented the on-disk format, after reverse engineering it[2] and made a few benchmarks[3].<p>For my benchmarks I used an image I extracted from the Debian XFCE LiveDVD (~6.5GiB as tar archive, ~2GiB as XZ compressed SquashFS image). By playing around a bit, I also realized that the compressed meta data is &quot;amazingly small&quot;, compared to the actual image file data and the resulting images are very close to the tar ball compressed with the same compressor settings.<p>I can accept a claim of being a little smaller than SquashFS, but the claimed difference makes me very suspicious. From the README, I&#x27;m not quite sure: Does the Raspbian image comparison compare XZ compression against SquashFS with Zstd?<p>I have cloned the git tree and installed dozens of libraries that this folly thingy needs, but I&#x27;m currently swamped in CMake errors (haven&#x27;t touched CMake in 8+ years, so I&#x27;m a bit rusty there) and the build fails with some <i>still</i> missing headers. I hope to have more luck later today and produce a comparison on my end using my trusty Debian reference image which I will definitely add to my existing benchmarks.<p>Also, is there any documentation on how the on-disk format for DwarFS and it&#x27;s packing works which might explain the incredible size difference?<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;AgentD&#x2F;squashfs-tools-ng" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;AgentD&#x2F;squashfs-tools-ng</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;AgentD&#x2F;squashfs-tools-ng&#x2F;blob&#x2F;master&#x2F;doc&#x2F;format.txt" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;AgentD&#x2F;squashfs-tools-ng&#x2F;blob&#x2F;master&#x2F;doc&#x2F;...</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;AgentD&#x2F;squashfs-tools-ng&#x2F;tree&#x2F;master&#x2F;doc" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;AgentD&#x2F;squashfs-tools-ng&#x2F;tree&#x2F;master&#x2F;doc</a>
评论 #25251988 未加载
评论 #25256152 未加载
Hello71超过 4 年前
&gt; You can pick either clang or g++, but at least recent clang versions will produce substantially faster code<p>have you investigated why this might be the case?
评论 #25256038 未加载
aarchi超过 4 年前
I have several highly-redundant NTFS backups that I&#x27;d like to compress into a read-only fs. Can DwarFS preserve all NTFS metadata?
评论 #25248773 未加载
saurabhnanda超过 4 年前
Is this useful for long-term log storage? say, from a typical webapp (eg. Nginx logs, Rails logs, Postgres logs, etc)
throwmemoney超过 4 年前
Compression - anyone using lrzip on production servers?
GGfpc超过 4 年前
What are the use cases for a read only file system?
评论 #25246383 未加载
评论 #25247004 未加载
评论 #25246670 未加载
评论 #25246886 未加载
评论 #25246660 未加载
评论 #25247446 未加载
评论 #25247412 未加载