TechEcho

13 comments

twelfthnightalmost 2 years ago

Are folks typically using HNSW for vector search these days? I thought maybe ScaNN has proven to be better? Especially since it's available in FAISS [2].[1] <a href="https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html?m=1" rel="nofollow noreferrer">https://ai.googleblog.com/2020/07/announcing-scann-efficient...</a> [2] <a href="https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)">https://github.com/facebookresearch/faiss/wiki/Fast-accumula...</a>

评论 #36948329 未加载

评论 #36948597 未加载

评论 #36952921 未加载

nlalmost 2 years ago

Slightly offtopic, but I'm currently working on a video similarity search tool, and the vectors I'm using are pretty big (the size of a vector is over 2M). This is quite different to the normal vector size of maybe 10k max.Currently I'm using Annoy (mostly because it's what I've used before) but I am a bit worried that this is well outside what it has been designed for.Has anyone got specific advice for things I should try? I've used FAISS previously but it seems to have the same design space.

评论 #36953587 未加载

评论 #36953799 未加载

评论 #36952515 未加载

评论 #36952379 未加载

评论 #36952261 未加载

freediveralmost 2 years ago

I am interested in testing this in production, instead of faiss/mrpt.> metric='cos', # Choose 'l2sq', 'haversine' or other metric, default = 'ip'As a note, it is actually 'l2_sq' for the Python example.> index.add(labels=np.arange(len(vectors)), vectors=vectors)Adding to index appears to be very slow. Also labels are listed as an optional param but the Python SDK has them as required.Do you have setup of params for 'brute force' approach (100% accuracy)?

评论 #36952300 未加载

eitan-turokalmost 2 years ago

This looks like a great package. Many vector-search engines do not allow you to implement your own custom distance metrics. But Unum does. Love it!

评论 #36947608 未加载

adultSwimalmost 2 years ago

In the vein of single-file databases, I've been enjoying DuckDB and am exploring Kùzu, both coming out of the database group at University of Waterloo. DuckDB aims to be a SQLite for analytics (OLAP), while Kùzu is an analytics focused graph database.<a href="https://duckdb.org/" rel="nofollow noreferrer">https://duckdb.org/</a> <a href="https://kuzudb.com/" rel="nofollow noreferrer">https://kuzudb.com/</a>

CharlesWalmost 2 years ago

@ashvardanian, what are reasons a developer would choose this over sqlite-vss?

评论 #36952337 未加载

ukuinaalmost 2 years ago

The fact that USearch has a WASM binding for frontend use (AND supports serialization) is very cool for client-side search/LLM applications!How would I integrate this into a dense passage retriever workflow for RAG? I could not find any examples for document chunk ingestion and similarity query.

nh2almost 2 years ago

Is view() for disk-based indexes doing something special over plain mmap(), e.g. setting read-aheads based on the knowledge of the intental structure to make it faster if done over the network?Talking about <a href="https://github.com/unum-cloud/usearch#disk-based-indexes">https://github.com/unum-cloud/usearch#disk-based-indexes</a>

评论 #36954742 未加载

moabalmost 2 years ago

Do you have plans to support metadata filtering?

评论 #36949618 未加载

svcrunchalmost 2 years ago

I'm curious, is HSNW the only option? Do you support IVF-style indexes? Also, FAISS is nice because it supports a pluggable storage layer. Is this something that's easily supported in USearch?Great work, and thank you for your contributions.

j2kunalmost 2 years ago

In this page they have "space filling curves" as an example in one of the images, but I haven't been able to find production systems that actually use space filling curves for similarity search. Anyone have any tips?

评论 #36952609 未加载

KRAKRISMOTTalmost 2 years ago

What's performance like without BLAS acceleration?

评论 #36948190 未加载

ykadowakalmost 2 years ago

@ashvardanian any plan to put it on ANN benchmarks?

评论 #36953357 未加载

13 comments

twelfthnightalmost 2 years ago

评论 #36948329 未加载

评论 #36948597 未加载

评论 #36952921 未加载

nlalmost 2 years ago

评论 #36953587 未加载

评论 #36953799 未加载

评论 #36952515 未加载

评论 #36952379 未加载

评论 #36952261 未加载

freediveralmost 2 years ago

评论 #36952300 未加载

eitan-turokalmost 2 years ago

This looks like a great package. Many vector-search engines do not allow you to implement your own custom distance metrics. But Unum does. Love it!

评论 #36947608 未加载

adultSwimalmost 2 years ago

CharlesWalmost 2 years ago

@ashvardanian, what are reasons a developer would choose this over sqlite-vss?

评论 #36952337 未加载

ukuinaalmost 2 years ago

nh2almost 2 years ago

评论 #36954742 未加载

moabalmost 2 years ago

Do you have plans to support metadata filtering?

评论 #36949618 未加载

svcrunchalmost 2 years ago

j2kunalmost 2 years ago

评论 #36952609 未加载

KRAKRISMOTTalmost 2 years ago

What's performance like without BLAS acceleration?

评论 #36948190 未加载

ykadowakalmost 2 years ago

@ashvardanian any plan to put it on ANN benchmarks?

评论 #36953357 未加载

USearch: Smaller and faster single-file vector search engine

13 comments

USearch: Smaller and faster single-file vector search engine

13 comments