TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Simhashing (Hopefully) Made Simple (2012)

37 pointsby kkmabout 4 years ago

2 comments

jonathankorenabout 4 years ago
I wrote something similar several years ago on minhashing for near duplicate detection.<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;@jonathankoren&#x2F;near-duplicate-detection-b6694e807f7a" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@jonathankoren&#x2F;near-duplicate-detection-b...</a>
Nzenabout 4 years ago
Simhashing is a style of characterizing the similarity of data. The author begins with the idea that we can discard the first characters of { aaarock, aabjeep, aaareep } to prefer the latter two as most similar and concludes with computing the hamming distance of data.