TechEcho

8 comments

greenyodaalmost 5 years ago

If someone is looking for the referenced paper (Accelerating Large-Scale Inference with Anisotropic Vector Quantization), it can be found here: <a href="https://arxiv.org/abs/1908.10396" rel="nofollow">https://arxiv.org/abs/1908.10396</a>(It's also linked in the repository's docs/algorithms.md, to which there's a link at the bottom of the main page.)

评论 #23757034 未加载

softwaredougalmost 5 years ago

Right now ANN is a huge (and fascinating) area for active data structures/Algos research.I do sort of wonder if we will reach a point where instead of pure recall per speed, we’ll have a family of algorithms with more trade offs for the use case. Where we begin to look at metrics for that domain for a given ANN approach instead of just its ability to recreate a nearest neighbors set retrieval.Like one thing I see assumed is that while recall at N is good, does this also mean the “ranking” of that top N is ideal? I don’t want to have to manually compute NN on this top N if I can avoid it, for example.And are there specific vector spaces where one ANN is preferred? Or will there be some universal approach that just works for everything?I realize it’s too early to tell, but these questions always percolate in my mind when we hit a new benchmark in recall for speed. Especially since I still see people doing more naive things that seem to work perfectly fine for their use case (like KD trees, random projections, or LSH)

评论 #23757066 未加载

评论 #23755897 未加载

ricksharpalmost 5 years ago

Ok, so I am just trying to understand the basic concepts in the paper and put it in my own words:It seems that the primary idea is that quantization precision is more important where there is a high density of neighbors.I.e. at the edges the quantized sections (buckets) could be large since there are few items there, but at high density areas, the buckets should be much smaller in order to have an even distribution of objects per bucket as possible.Therefore, the overall effectiveness of a Quantization loss function should not be evaluated on a sum of squared error (that assumes the vector space has consistent linear value), but should rather consider the densities of the vector space and use that as a weight of the errors at different regions.To me it seems analogous to a hash set, where the goal would be to have even distribution (same number of items in every bucket).We want to quantize space so that every position has about the same number of items.

评论 #23756434 未加载

cs702almost 5 years ago

Wow, this looks impressively fast at very reasonable recall rates in the ANN-benchmarks. It seems to leave faiss and nmslib in the dust. Pulling up the arXiv paper as we speak to figure out what these guys are doing to achieve such impressive results.

eximiusalmost 5 years ago

I'm surprised that I don't see DBSCAN, HDBSCAN, Spectral, etc. I don't even recognize these methods. Am I missing something or have the methods I'm familiar with become obsolete that fast?

评论 #23754717 未加载

评论 #23755925 未加载

评论 #23754615 未加载

hosejaalmost 5 years ago

<a href="https://github.com/google-research/google-research/commit/406566cfafc83bcc4d54f82efa43fd3819039905#diff-9f09552fb2b5917f8532e55facc3734b" rel="nofollow">https://github.com/google-research/google-research/commit/40...</a>

评论 #23758157 未加载

jszymborskialmost 5 years ago

Would this algorithm suit the case of wanting to find neighbors within a set radius of a point? Does anyone know of an approximate method for doing this?

评论 #23758603 未加载

phenkdoalmost 5 years ago

how does this compare to milvus?

评论 #23754143 未加载

8 comments

greenyodaalmost 5 years ago

评论 #23757034 未加载

softwaredougalmost 5 years ago

评论 #23757066 未加载

评论 #23755897 未加载

ricksharpalmost 5 years ago

评论 #23756434 未加载

cs702almost 5 years ago

eximiusalmost 5 years ago

I'm surprised that I don't see DBSCAN, HDBSCAN, Spectral, etc. I don't even recognize these methods. Am I missing something or have the methods I'm familiar with become obsolete that fast?

评论 #23754717 未加载

评论 #23755925 未加载

评论 #23754615 未加载

hosejaalmost 5 years ago

评论 #23758157 未加载

jszymborskialmost 5 years ago

Would this algorithm suit the case of wanting to find neighbors within a set radius of a point? Does anyone know of an approximate method for doing this?

评论 #23758603 未加载

phenkdoalmost 5 years ago

how does this compare to milvus?

评论 #23754143 未加载

Scann: Scalable Nearest Neighbors

8 comments

Scann: Scalable Nearest Neighbors

8 comments