TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Fdupes: A tool to de-duplicate files

23 点作者 evandrix超过 10 年前

9 条评论

wazoox超过 10 年前
I personnally use fdupes.pl:<p><a href="http://www.perlmonks.org/?node_id=85202" rel="nofollow">http:&#x2F;&#x2F;www.perlmonks.org&#x2F;?node_id=85202</a><p>Tested on many millions of files, works like a charm (though it can run out of memory on a 32 bits machine). I&#x27;m using the enhanced version here: <a href="http://www.perlmonks.org/?node_id=1099194" rel="nofollow">http:&#x2F;&#x2F;www.perlmonks.org&#x2F;?node_id=1099194</a> which has an autodelete flag and prudently ignore symlinks.
TheDong超过 10 年前
I personally found fdupes to be slower and more limited than dupfiles [0].<p>I switched to dupfiles about a year ago and haven&#x27;t had any problems yet.<p>[0]: <a href="http://liw.fi/dupfiles/" rel="nofollow">http:&#x2F;&#x2F;liw.fi&#x2F;dupfiles&#x2F;</a>
mmastrac超过 10 年前
I used this when I was working on a product that used automated tests to upload files repeatedly during the day. The volume of test files was so great that it continually put pressure on the storage -- more pressure than the uploads from the actual users.<p>Fortunately the uploads were from a set of a few dozen static files, and de-duplicating the data via fdupes was able to drop disk usage by a factor of 20-50x.
cwilper超过 10 年前
I did something similar to this a while back, called qdupe[0], written in Python. It doesn&#x27;t do the deleting for you, but is very fast at identifying duplicates if you have a lot to compare. Based on the fastdup algorithm.<p>[0] <a href="https://github.com/cwilper/qdupe" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cwilper&#x2F;qdupe</a>
DigitalJack超过 10 年前
Is this multiplatform? I think it&#x27;s interesting how many projects forget to mention what operating system they target.
评论 #8254279 未加载
评论 #8255551 未加载
panzi超过 10 年前
Yeah, I wrote something similar a long time ago in Python: <a href="https://bitbucket.org/panzi/finddup/src" rel="nofollow">https:&#x2F;&#x2F;bitbucket.org&#x2F;panzi&#x2F;finddup&#x2F;src</a>
kylek超过 10 年前
It&#x27;s not exactly clear, but I&#x27;m assuming this is some kind of automated hard-linking utility? Or does it use its own special magic? (filesystem type restrictions?)
评论 #8256393 未加载
评论 #8254838 未加载
theophrastus超过 10 年前
not nearly as fancy, but it gets the job done for me: <a href="http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash" rel="nofollow">http:&#x2F;&#x2F;www.commandlinefu.com&#x2F;commands&#x2F;view&#x2F;3555&#x2F;find-duplica...</a>
xenonite超过 10 年前
a similar tool is <a href="https://code.google.com/p/hardlinkpy/" rel="nofollow">https:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;hardlinkpy&#x2F;</a>