TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
The simhash patent has expired and is now free to use
12 points
by
ubutler
8 months ago
1 comment
ubutler
8 months ago
Simhash is an extremely fast and simple algorithm for detecting near duplicate text at scale which makes it particularly useful for deduplicating AI training datasets.