TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Scann: Scalable Nearest Neighbors

107 点作者 blopeur将近 5 年前

8 条评论

greenyoda将近 5 年前
If someone is looking for the referenced paper (<i>Accelerating Large-Scale Inference with Anisotropic Vector Quantization</i>), it can be found here: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1908.10396" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1908.10396</a><p>(It&#x27;s also linked in the repository&#x27;s docs&#x2F;algorithms.md, to which there&#x27;s a link at the bottom of the main page.)
评论 #23757034 未加载
softwaredoug将近 5 年前
Right now ANN is a huge (and fascinating) area for active data structures&#x2F;Algos research.<p>I do sort of wonder if we will reach a point where instead of pure recall per speed, we’ll have a family of algorithms with more trade offs for the use case. Where we begin to look at metrics for that domain for a given ANN approach instead of just its ability to recreate a nearest neighbors set retrieval.<p>Like one thing I see assumed is that while recall at N is good, does this also mean the “ranking” of that top N is ideal? I don’t want to have to manually compute NN on this top N if I can avoid it, for example.<p>And are there specific vector spaces where one ANN is preferred? Or will there be some universal approach that just works for everything?<p>I realize it’s too early to tell, but these questions always percolate in my mind when we hit a new benchmark in recall for speed. Especially since I still see people doing more naive things that seem to work perfectly fine for their use case (like KD trees, random projections, or LSH)
评论 #23757066 未加载
评论 #23755897 未加载
ricksharp将近 5 年前
Ok, so I am just trying to understand the basic concepts in the paper and put it in my own words:<p>It seems that the primary idea is that quantization precision is more important where there is a high density of neighbors.<p>I.e. at the edges the quantized sections (buckets) could be large since there are few items there, but at high density areas, the buckets should be much smaller in order to have an even distribution of objects per bucket as possible.<p>Therefore, the overall effectiveness of a Quantization loss function should not be evaluated on a sum of squared error (that assumes the vector space has consistent linear value), but should rather consider the densities of the vector space and use that as a weight of the errors at different regions.<p>To me it seems analogous to a hash set, where the goal would be to have even distribution (same number of items in every bucket).<p>We want to quantize space so that every position has about the same number of items.
评论 #23756434 未加载
cs702将近 5 年前
Wow, this looks impressively fast at very reasonable recall rates in the ANN-benchmarks. It seems to leave faiss and nmslib in the dust. Pulling up the arXiv paper as we speak to figure out what these guys are doing to achieve such impressive results.
eximius将近 5 年前
I&#x27;m surprised that I don&#x27;t see DBSCAN, HDBSCAN, Spectral, etc. I don&#x27;t even <i>recognize</i> these methods. Am I missing something or have the methods I&#x27;m familiar with become obsolete that fast?
评论 #23754717 未加载
评论 #23755925 未加载
评论 #23754615 未加载
hoseja将近 5 年前
<a href="https:&#x2F;&#x2F;github.com&#x2F;google-research&#x2F;google-research&#x2F;commit&#x2F;406566cfafc83bcc4d54f82efa43fd3819039905#diff-9f09552fb2b5917f8532e55facc3734b" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;google-research&#x2F;google-research&#x2F;commit&#x2F;40...</a>
评论 #23758157 未加载
jszymborski将近 5 年前
Would this algorithm suit the case of wanting to find neighbors within a set radius of a point? Does anyone know of an approximate method for doing this?
评论 #23758603 未加载
phenkdo将近 5 年前
how does this compare to milvus?
评论 #23754143 未加载