TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Ratarmount 1.0.0 – Rapid access to large archives via a FUSE filesystem

76 点作者 mxmlnkn6 个月前
Hi HN,<p>Since my first posted introduction of ratarmount [0], 2 years have gone by and many features have been added.<p>To summarize, ratarmount enables working with archived contents exposed as a filesystem without the data having to be extracted to disk:<p><pre><code> pip install ratarmount ratarmount archive.tar mounted ls -la mounted </code></pre> I started this project after noticing the slowness of archivemount with large TAR files and wondering how this could be because the file contents exist at some offset in the archive file and it should not be difficult to read that data. Turns out, that part was not difficult, however packaging everything nicely, adding tests, and adding many more formats and features such as union mounting and recursive mounting, are the things keeping me busy on this project until today. Since the last Show HN, a libarchive, SquashFS, fsspec, and many more backends have been added, so that it now should be able to read every format that archivemount can and some more, and even read them remotely. However, performance for any use case besides bzip2&#x2F;gzip-compressed TARs may vary even though I did my best.<p>Personally, I am using it view to packed folders with many small files that do not change anymore. I pack these folders because else copying to other hard drives takes much longer. I&#x27;m also using it when I want to avoid the command line. I have added ratarmount as a Caja user script for mounting via right-click. This way, I can mount an archive and then copy the contents to another drive to effectively do the extraction and copying in one step. Initially, I have also used it to train on the ImageNet TAR archive directly.<p>I probably should have released a 1.0.0 some years ago because I have kept the command line interface and even the index file format compatible as best as possible between the several 0.x versions already.<p>Some larger future features on my wishlist are:<p>- A new indexed_lz4 backend. This should be doable inside my indexed_bzip2 [1] &#x2F; rapidgzip [2] backend library.<p>- A custom ZIP and SquashFS reader accelerated by rapidgzip and indexed_bzip2 to enable faster seeking inside large files inside those archives.<p>- I am eagerly awaiting the Linux Kernel FUSE BPF support [3], which might enable some further latency reductions for use cases with very small files &#x2F; very small reads, at least in the case of working with uncompressed archives. I have done comparisons for such archives (100k images a 100 KiB) and noticed that direct access via the Python library ratarmountcore was roughly two times faster than access via ratarmount and FUSE. Maybe I&#x27;ll even find the time to play around with the existing unmerged FUSE BPF patch set.<p>[0] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=30631387">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=30631387</a><p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31875318">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31875318</a><p>[2] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37378411">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37378411</a><p>[3] <a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;937433&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;937433&#x2F;</a>

5 条评论

kenmacd6 个月前
I find this project hugely helpful when working with Google Takeout archives. I normally pick a size that&#x27;s not too large so that downloading them is easier, then it&#x27;s simply a matter of:<p><pre><code> ratarmount .&#x2F;takeout-20231130T224325Z-0*.tgz .&#x2F;mnt</code></pre>
sziiiizs6 个月前
That is very cool. May I ask, how does the compressed stream seeking work? Does it keep state of the decompressor at certain points so arbitrary access can be faster than reading from the start of the stream?
评论 #42019767 未加载
ranger_danger6 个月前
similiar projects:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;cybernoid&#x2F;archivemount">https:&#x2F;&#x2F;github.com&#x2F;cybernoid&#x2F;archivemount</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;fuse-archive">https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;fuse-archive</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;mount-zip">https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;mount-zip</a><p><a href="https:&#x2F;&#x2F;bitbucket.org&#x2F;agalanin&#x2F;fuse-zip" rel="nofollow">https:&#x2F;&#x2F;bitbucket.org&#x2F;agalanin&#x2F;fuse-zip</a>
BoingBoomTschak6 个月前
Congratulations on your v1.0.0! This is definitely a very nice tool, I&#x27;ll try to play with it a bit and maybe try to make an ebuild (though the build system seems a bit complicated for proper no-network package managers). The extensive benchmark section is a nice plus.<p>A small note, archivemount has a living fork here: <a href="https:&#x2F;&#x2F;git.sr.ht&#x2F;~nabijaczleweli&#x2F;archivemount-ng" rel="nofollow">https:&#x2F;&#x2F;git.sr.ht&#x2F;~nabijaczleweli&#x2F;archivemount-ng</a>
lathiat6 个月前
This is awesome :)