TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
The simhash patent has expired and is now free to use
12 点
作者
ubutler
8 个月前
1 comment
ubutler
8 个月前
Simhash is an extremely fast and simple algorithm for detecting near duplicate text at scale which makes it particularly useful for deduplicating AI training datasets.