I made an app to fuzzy-deduplicate my Google Sheets and CRM records<p>- No manual configuration required
- Works out-of-the-box on most data types (ex. people, companies, product catalog)<p>Implementation details:<p>- Embeds records using an E5 model
- Performs similarity search using DuckDB w/ vector similarity extension
- Does last-mile comparison and merges duplicates using Claude<p>Demo video: <a href="https://youtu.be/7mZ0kdwXBwM" rel="nofollow">https://youtu.be/7mZ0kdwXBwM</a><p>Github repo (Apache 2.0 licensed): <a href="https://github.com/SnowPilotOrg/dedupe_it">https://github.com/SnowPilotOrg/dedupe_it</a><p>Lmk any feedback on how to make this better!