I did something similar to this a while back, called qdupe[0], written in Python. It doesn't do the deleting for you, but is very fast at identifying duplicates if you have a lot to compare. Based on the fastdup algorithm.<p>[0] <a href="https://github.com/cwilper/qdupe" rel="nofollow">https://github.com/cwilper/qdupe</a>