TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: I Built a Semantic De-Deduplicator

2 点作者 gkamradt超过 1 年前
Hey HN Crew!<p>We all have lists...and they can be annoying to de-duplicate.<p>* User feedback * Groceries * Employee Surveys * Bug reports * You name it<p>Most ways to consolidate like-items work off of keywords or worse, exact phrases (Sheets&#x2F;Excel).<p>But LLMs are much better at understanding an items semantic meaning and determining if two items should be combined or not.<p>I decided to build my first python package, The Semantic Deduplicator, to help me consolidate items based on their meaning, not keywords.<p>For Example On Groceries: [&#x27;We need more berries&#x27;, &#x27;I want more more milk&#x27;, &#x27;Can we get more carbonated water please?&#x27;, &#x27;We need more sparkling water&#x27;] ...deduplicated... [&#x27;Berries&#x27;, &#x27;Milk&#x27;, &#x27;Sparkling Water&#x27;]<p>How it works:<p>1. Start with an empty list ready to populate<p>2. The first item you add will get 1) transformed into a clean name (user feedback &gt; product request) and 2) added to the list<p>3. While you&#x27;re adding more items<p>* Check to see if your new item&#x27;s embedding is close to any existing item<p>* If so, ask the LLM to compare your two items to see if they should be combined<p>* If so, combine them<p>This package is more of an exploration and POC so be careful with it. I&#x27;d love to hear any feedback.<p>All the links:<p>* YT Explainer Video: https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=etLsNgkGbeM<p>* Twitter Thread: https:&#x2F;&#x2F;twitter.com&#x2F;GregKamradt&#x2F;status&#x2F;1719760658936545336<p>* Pypi: https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;semantic-deduplicator&#x2F;<p>* Github: https:&#x2F;&#x2F;github.com&#x2F;gkamradt&#x2F;SemanticDeduplicator

2 条评论

skeptrune超过 1 年前
This is smart and solid work.<p>We had the same idea and made it a core product feature - <a href="https:&#x2F;&#x2F;docs.arguflow.ai&#x2F;duplicate_detection" rel="nofollow noreferrer">https:&#x2F;&#x2F;docs.arguflow.ai&#x2F;duplicate_detection</a>
nbbaier超过 1 年前
Really cool stuff! Definitely going to try to fit this into a project.