TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Binary vector embeddings are so cool

87 点作者 emschwartz6 个月前

6 条评论

xfalcox6 个月前
I shipped this in Discourse a couple of weeks ago, and it's incredible tech. And fully supported in PostgreSQL via pgvector.
mosselman6 个月前
That is crazy. So how do you use these right now? Do you store vectors on disk and iterate over them, or something else?<p>How do you do the search?
评论 #42109213 未加载
评论 #42108771 未加载
kevinventullo6 个月前
It got me thinking, what might it look like to natively train a binary quantized embedding model? You can’t do calculus per se on {0,1}, but maybe you could do something like randomly flip bits with a probability weighted by the severity of the error during backprop… anyway, I’m sure there’s plenty of literature about this.
评论 #42109674 未加载
评论 #42109572 未加载
2-3-7-43-18076 个月前
Is it fair to say that Hamming distance relates to cosine similarity in this context like ReLU relates to the Sigmoid function for neural networks?
titusz6 个月前
Yes, binary embeddings are cool indeed. And because they can be so small they even double down as powerful cluster identifiers. Built this a while ago: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;iscc&#x2F;iscc-sct" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;iscc&#x2F;iscc-sct</a>
nalzok6 个月前
I wonder what would happen if we quantized each dimension to 0.5 (or even fewer) bits instead of 1, i.e., taking 2 (or more) scalar components at a time and mapping them to 0 or 1 based on some carefully designed rules.