DwarFS: A fast high compression read-only file system

280 pointsby daantjeover 4 years ago

22 comments

> I started working on DwarFS in 2013 and my main use case and major motivation was that I had several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space, and I was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them.It fills me with joy that someone has been coding a fs for 7 years due to perl installs taking too much space. Necessity is the mother of all invention.

评论 #25246472 未加载

评论 #25246591 未加载

评论 #25248282 未加载

评论 #25251897 未加载

fefe23over 4 years ago

It looks like the benefit is some kind of block or file deduplication.@OP: Can you please explain why you keep 50 gigs of perl around? :-)I use compressed read-only file systems all the time to save space on my travel laptop. I have one squashfs for firefox, one for the TeX base install, one for LLVM, one for qemu, one for my cross compiler collection. I suspect the gains over squashfs will be far less pronounced than for the pathological "400 perl version".

评论 #25247833 未加载

评论 #25250495 未加载

评论 #25247487 未加载

smitty1eover 4 years ago

Whew! It was easy to find out how you actually initialize this thing, if it's read-only:<a href="https://github.com/mhx/dwarfs/blob/main/man/mkdwarfs.md" rel="nofollow">https://github.com/mhx/dwarfs/blob/main/man/mkdwarfs.md</a>

slagfartover 4 years ago

Perhaps not strictly on-topic, but is there any equivalent FS/program in Windows that will allow users to have read-only access to files that are deduplicated in some way?My use case is the MAME console archives, which are now full of copies of games from different localisations with 99% identical content. 7Z will compress them together and deduplicate, but breaks once the archive exceeds a few gigs.These archives are already compressed (CHD format, which is 7Z + FLAC for ISOs), but it's deduplication that needs to happen on top of these already compressed files that I'm struggling with.Sorry for the off-topic ask!

评论 #25251031 未加载

评论 #25249438 未加载

评论 #25249450 未加载

评论 #25249420 未加载

评论 #25248987 未加载

Scaevolusover 4 years ago

Neat! I'd like to see benchmarks for more typical squashfs payloads-- embedded root filesystems totalling under 100MB. Small docker images like alpine would be a decent proxy. The given corpus of thousands of perl versions is more appropriate for comparison against git.

评论 #25246369 未加载

david_dracoover 4 years ago

I wish there was a semi-compressed transparent filesystem layer which slowly compresses the least recently used files in the background, and un-compresses files upon use. That way you could store much more mostly unused content than space on the disk, without sacrificing accessibility.

评论 #25249400 未加载

评论 #25246314 未加载

评论 #25249341 未加载

评论 #25246657 未加载

评论 #25248414 未加载

hachariover 4 years ago

Why not use BTRFS with file deduplication and transparent compression (zstd specifically)?

评论 #25246787 未加载

评论 #25246808 未加载

stabblesover 4 years ago

mksquashfs supports gzip, xz, lzo, lz4 and zstd too, you can also compile it to have any of those as a default instead of gzip.Does the performance benchmark show DwarFS versus single-threaded gzip compressed SquashFS?

评论 #25247809 未加载

gnosekover 4 years ago

Is this viable as a backup/archive format? Would it make sense to e.g. have an incremental backup as a DwarFS file, referring to the base backup in another DwarFS file?

评论 #25247474 未加载

giovannibonettiover 4 years ago

This could be awesome for compressing Docker image layers. After all, they can be huge (hundreds of MB) and, if the Dockerfile is well organized, each step should contain a fairly homogeneous set of files (like apt-get artifacts, for example).

bottoover 4 years ago

It would amazing to see this work on OpenWRT, I think it would fit perfectly using less resources than squashfs. The other location would be on a Raspberry pi for scenarios where power can be cut at any time.

评论 #25246529 未加载

评论 #25246511 未加载

jedbergover 4 years ago

Does anyone remember back in the 90s when we'd install DoubleSpace to get on the fly compression? And then they built it into MSDOS 6 and that was a major game changer?

评论 #25250217 未加载

evantahlerover 4 years ago

Oh wow. This would be excellent for language dependencies - ruby gems, node_modules, etc. Integrating this with something like pnpm [1], which already keeps a global store of dependencies would excellent. [1] - <a href="https://pnpm.js.org" rel="nofollow">https://pnpm.js.org</a>

rurbanover 4 years ago

So I tried it out on my 17BG of perl builds. (just on my laptop, not on my big machine).mkdwarfs crashed with recursive links (1-level, just pointing to itself) and when I removed dirs while running mkdwarfs, which were part of of the input path. Which is fair, I assume.

评论 #25264236 未加载

评论 #25253708 未加载

ed25519FUUUover 4 years ago

I noticed that enabling compression on zfs made a huge difference with the source size of some of my largely text file petitions. I never turned on deduplication because I don’t want to bother with the memory overhead, but I bet that would help even further.

评论 #25249593 未加载

Twirrimover 4 years ago

I'm curious, why do you have so many perl installations around. I thought I'd got a fair number of python venvs kicking around for each of the repos I'm dealing with, but nowhere near that many.

评论 #25247046 未加载

st_goliathover 4 years ago

Circa 2 years ago, I was working on a side project and got so annoyed with SquashFS tooling, that I decided to fix it instead. After getting stuck with the spaghetti code behind mksquashfs, I decided to start from scratch, having learnt enough about SquashFS to roughly understand the on-disk format.Because squashfs-tools seemed pretty unmaintained in late 2018 (no activity on the official site & git tree for years and only one mailing list post "can you do a release?" which got a very annoyed response) I released my tooling as "squashfs-tools-ng" and it is currently packaged by a hand full of distros, including Debian & Ubuntu.[1]I also thoroughly documented the on-disk format, after reverse engineering it[2] and made a few benchmarks[3].For my benchmarks I used an image I extracted from the Debian XFCE LiveDVD (~6.5GiB as tar archive, ~2GiB as XZ compressed SquashFS image). By playing around a bit, I also realized that the compressed meta data is "amazingly small", compared to the actual image file data and the resulting images are very close to the tar ball compressed with the same compressor settings.I can accept a claim of being a little smaller than SquashFS, but the claimed difference makes me very suspicious. From the README, I'm not quite sure: Does the Raspbian image comparison compare XZ compression against SquashFS with Zstd?I have cloned the git tree and installed dozens of libraries that this folly thingy needs, but I'm currently swamped in CMake errors (haven't touched CMake in 8+ years, so I'm a bit rusty there) and the build fails with some still missing headers. I hope to have more luck later today and produce a comparison on my end using my trusty Debian reference image which I will definitely add to my existing benchmarks.Also, is there any documentation on how the on-disk format for DwarFS and it's packing works which might explain the incredible size difference?[1] <a href="https://github.com/AgentD/squashfs-tools-ng" rel="nofollow">https://github.com/AgentD/squashfs-tools-ng</a>[2] <a href="https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/format.txt" rel="nofollow">https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...</a>[3] <a href="https://github.com/AgentD/squashfs-tools-ng/tree/master/doc" rel="nofollow">https://github.com/AgentD/squashfs-tools-ng/tree/master/doc</a>

评论 #25251988 未加载

评论 #25256152 未加载

Hello71over 4 years ago

> You can pick either clang or g++, but at least recent clang versions will produce substantially faster codehave you investigated why this might be the case?

评论 #25256038 未加载

aarchiover 4 years ago

I have several highly-redundant NTFS backups that I'd like to compress into a read-only fs. Can DwarFS preserve all NTFS metadata?

评论 #25248773 未加载

saurabhnandaover 4 years ago

Is this useful for long-term log storage? say, from a typical webapp (eg. Nginx logs, Rails logs, Postgres logs, etc)

throwmemoneyover 4 years ago

Compression - anyone using lrzip on production servers?

GGfpcover 4 years ago

What are the use cases for a read only file system?

评论 #25246383 未加载

评论 #25247004 未加载

评论 #25246670 未加载

评论 #25246886 未加载

评论 #25246660 未加载

评论 #25247446 未加载

评论 #25247412 未加载