TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Binary vector search is better than FP32 vectors

137 点作者 gaocegege大约 1 年前

8 条评论

refulgentis大约 1 年前
Dear reader: your intuition is right: it&#x27;s not &quot;better&quot; to reduce the information by a factor of 32.<p>The game the article plays is to do a KNN search to deal with the fact this flattens similarity significantly.<p>TL;DR from the field:<p>- this is extremely helpful for doing a first pass over a _ton_ of documents in a resource constrained environment.<p>- it is extremely <i>unhelpful</i> unless you&#x27;re retrieving 10x the documents you want via binary, then doing re-ranking via the FP-32 to rank the remaining.<p>- in general, it&#x27;s unlikely you need the technique unless you&#x27;re A) on the edge, i.e. on consumer devices from 3 years ago or B) you have tens of millions of vectors on a server. All this stuff sounds really fancy, but when you implement it from scratch, you quickly learn &quot;oh its 384 numbers I gotta multiply together&quot;<p>Source: I do embeddings, locally, to do retrieval for RAG. I &quot;discovered&quot; this about a year ago, and it deeply pains me to see anything that will misinform a lot of people.<p>Free bonus I haven&#x27;t seen mentioned elsewhere yet*: you can take the average of the N embeddings forming a document to figure out if you should look at the N embeddings individually. This does over smooth too, ex. my original test document is the GPT-4 sparks paper, and the variety of subjects mentioned and length (100+ pages) meant it was over smooth when searching for the particular example I wanted it to retrieve (the unicorn SVG)<p>* edited to clarify given reply. also my reaction to reading it, &quot;that dude rocks!!&quot;, made me wanna go off a little bit: if you&#x27;re not in AI&#x2F;ML, don&#x27;t be intimidated by it when its wrapped in layers of obscure vocabulary. once you have the time to go do it, things you would have thought that were &quot;stupid&quot; turn out to be just fine. and you find like-minded souls. It&#x27;s exhilirating
评论 #39845994 未加载
评论 #39847418 未加载
评论 #39842646 未加载
评论 #39843837 未加载
评论 #39844464 未加载
评论 #39846308 未加载
评论 #39843215 未加载
heipei大约 1 年前
To the people dismissing the idea of binarising vectors: Fair criticism, but consider the fact that you can also train a model with a loss function that approaches a binary behaviour, i.e. so that the magnitude per dimension plays an insignificant role and only the sign of the dimension carries information. In that case you can use the binary vector for search and ranking.
评论 #39845598 未加载
barefeg大约 1 年前
This technique had a very recent resurgence via <a href="https:&#x2F;&#x2F;txt.cohere.com&#x2F;int8-binary-embeddings&#x2F;" rel="nofollow">https:&#x2F;&#x2F;txt.cohere.com&#x2F;int8-binary-embeddings&#x2F;</a>. Hugging face also covered the technique here <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;embedding-quantization" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;embedding-quantization</a>. It seems like a very good tradeoff compared to the shorter embeddings which require fine tuning via the matryoshka technique. On the other hand, Nils Reimers suggests that trivial quantization of the full precision embeddings is not as good as using “compression friendly” embeddings like Cohere’s Embed V3. Does anyone know what’s the difference in precision between trivial quantization and optimized embeddings?
crucio大约 1 年前
how well would this work for 762 dimensional vectors? At 3072, they&#x27;re starting with such a high number of dimensions that the accuracy loss may not be representative of what others would see
评论 #39845366 未加载
评论 #39845773 未加载
评论 #39850547 未加载
dicey大约 1 年前
This reminds me somewhat of the iSAX papers from ~2010 [0], which was focused on time series but used a pretty cool method to binarize&#x2F;discretize the real values data and do search. I wonder how folks building things like FAISS or vector DBs incorporate ideas like this , or if the two worlds don’t overlap very often.<p>[0]. <a href="https:&#x2F;&#x2F;www.cs.ucr.edu&#x2F;~eamonn&#x2F;iSAX_2.0.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cs.ucr.edu&#x2F;~eamonn&#x2F;iSAX_2.0.pdf</a>
kookamamie大约 1 年前
&gt; our experiments showed that the decrease in accuracy was not as big as expected<p>A decrease, nevertheless. This is always the issue with pruning, sparsification, etc. - to get SOTA, you will not want a decrease of any kind.
评论 #39859717 未加载
jn2clark大约 1 年前
What is accuracy in this case? is it meant to be recall or is it some evaluation metric?
评论 #39846991 未加载
rurban大约 1 年前
April 1 jokes already? Storing floats as bit, great idea!
评论 #39844267 未加载
评论 #39844292 未加载