TechEcho

9 comments

wazooxover 10 years ago

I personnally use fdupes.pl:<a href="http://www.perlmonks.org/?node_id=85202" rel="nofollow">http://www.perlmonks.org/?node_id=85202</a>Tested on many millions of files, works like a charm (though it can run out of memory on a 32 bits machine). I'm using the enhanced version here: <a href="http://www.perlmonks.org/?node_id=1099194" rel="nofollow">http://www.perlmonks.org/?node_id=1099194</a> which has an autodelete flag and prudently ignore symlinks.

TheDongover 10 years ago

I personally found fdupes to be slower and more limited than dupfiles [0].I switched to dupfiles about a year ago and haven't had any problems yet.[0]: <a href="http://liw.fi/dupfiles/" rel="nofollow">http://liw.fi/dupfiles/</a>

mmastracover 10 years ago

I used this when I was working on a product that used automated tests to upload files repeatedly during the day. The volume of test files was so great that it continually put pressure on the storage -- more pressure than the uploads from the actual users.Fortunately the uploads were from a set of a few dozen static files, and de-duplicating the data via fdupes was able to drop disk usage by a factor of 20-50x.

cwilperover 10 years ago

I did something similar to this a while back, called qdupe[0], written in Python. It doesn't do the deleting for you, but is very fast at identifying duplicates if you have a lot to compare. Based on the fastdup algorithm.[0] <a href="https://github.com/cwilper/qdupe" rel="nofollow">https://github.com/cwilper/qdupe</a>

DigitalJackover 10 years ago

Is this multiplatform? I think it's interesting how many projects forget to mention what operating system they target.

评论 #8254279 未加载

评论 #8255551 未加载

panziover 10 years ago

Yeah, I wrote something similar a long time ago in Python: <a href="https://bitbucket.org/panzi/finddup/src" rel="nofollow">https://bitbucket.org/panzi/finddup/src</a>

kylekover 10 years ago

It's not exactly clear, but I'm assuming this is some kind of automated hard-linking utility? Or does it use its own special magic? (filesystem type restrictions?)

评论 #8256393 未加载

评论 #8254838 未加载

theophrastusover 10 years ago

not nearly as fancy, but it gets the job done for me: <a href="http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash" rel="nofollow">http://www.commandlinefu.com/commands/view/3555/find-duplica...</a>

xenoniteover 10 years ago

a similar tool is <a href="https://code.google.com/p/hardlinkpy/" rel="nofollow">https://code.google.com/p/hardlinkpy/</a>

9 comments

wazooxover 10 years ago

TheDongover 10 years ago

mmastracover 10 years ago

cwilperover 10 years ago

DigitalJackover 10 years ago

Is this multiplatform? I think it's interesting how many projects forget to mention what operating system they target.

评论 #8254279 未加载

评论 #8255551 未加载

panziover 10 years ago

Yeah, I wrote something similar a long time ago in Python: <a href="https://bitbucket.org/panzi/finddup/src" rel="nofollow">https://bitbucket.org/panzi/finddup/src</a>

kylekover 10 years ago

It's not exactly clear, but I'm assuming this is some kind of automated hard-linking utility? Or does it use its own special magic? (filesystem type restrictions?)

评论 #8256393 未加载

评论 #8254838 未加载

theophrastusover 10 years ago

xenoniteover 10 years ago

a similar tool is <a href="https://code.google.com/p/hardlinkpy/" rel="nofollow">https://code.google.com/p/hardlinkpy/</a>

Fdupes: A tool to de-duplicate files

9 comments

Fdupes: A tool to de-duplicate files

9 comments