TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Fdupes: A tool to de-duplicate files

23 pointsby evandrixover 10 years ago

9 comments

wazooxover 10 years ago
I personnally use fdupes.pl:<p><a href="http://www.perlmonks.org/?node_id=85202" rel="nofollow">http:&#x2F;&#x2F;www.perlmonks.org&#x2F;?node_id=85202</a><p>Tested on many millions of files, works like a charm (though it can run out of memory on a 32 bits machine). I&#x27;m using the enhanced version here: <a href="http://www.perlmonks.org/?node_id=1099194" rel="nofollow">http:&#x2F;&#x2F;www.perlmonks.org&#x2F;?node_id=1099194</a> which has an autodelete flag and prudently ignore symlinks.
TheDongover 10 years ago
I personally found fdupes to be slower and more limited than dupfiles [0].<p>I switched to dupfiles about a year ago and haven&#x27;t had any problems yet.<p>[0]: <a href="http://liw.fi/dupfiles/" rel="nofollow">http:&#x2F;&#x2F;liw.fi&#x2F;dupfiles&#x2F;</a>
mmastracover 10 years ago
I used this when I was working on a product that used automated tests to upload files repeatedly during the day. The volume of test files was so great that it continually put pressure on the storage -- more pressure than the uploads from the actual users.<p>Fortunately the uploads were from a set of a few dozen static files, and de-duplicating the data via fdupes was able to drop disk usage by a factor of 20-50x.
cwilperover 10 years ago
I did something similar to this a while back, called qdupe[0], written in Python. It doesn&#x27;t do the deleting for you, but is very fast at identifying duplicates if you have a lot to compare. Based on the fastdup algorithm.<p>[0] <a href="https://github.com/cwilper/qdupe" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cwilper&#x2F;qdupe</a>
DigitalJackover 10 years ago
Is this multiplatform? I think it&#x27;s interesting how many projects forget to mention what operating system they target.
评论 #8254279 未加载
评论 #8255551 未加载
panziover 10 years ago
Yeah, I wrote something similar a long time ago in Python: <a href="https://bitbucket.org/panzi/finddup/src" rel="nofollow">https:&#x2F;&#x2F;bitbucket.org&#x2F;panzi&#x2F;finddup&#x2F;src</a>
kylekover 10 years ago
It&#x27;s not exactly clear, but I&#x27;m assuming this is some kind of automated hard-linking utility? Or does it use its own special magic? (filesystem type restrictions?)
评论 #8256393 未加载
评论 #8254838 未加载
theophrastusover 10 years ago
not nearly as fancy, but it gets the job done for me: <a href="http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash" rel="nofollow">http:&#x2F;&#x2F;www.commandlinefu.com&#x2F;commands&#x2F;view&#x2F;3555&#x2F;find-duplica...</a>
xenoniteover 10 years ago
a similar tool is <a href="https://code.google.com/p/hardlinkpy/" rel="nofollow">https:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;hardlinkpy&#x2F;</a>