If you were curious about the reference to MinHash in the OP, I just wrote a gentle guide to the MinHash family of algorithms (including our recent research extending it to probability distributions.)
<a href="https://moultano.wordpress.com/2018/11/08/minhashing-3kbzhsxyg4467-6/" rel="nofollow">https://moultano.wordpress.com/2018/11/08/minhashing-3kbzhsx...</a>