Are folks typically using HNSW for vector search these days? I thought maybe ScaNN has proven to be better? Especially since it's available in FAISS [2].<p>[1] <a href="https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html?m=1" rel="nofollow noreferrer">https://ai.googleblog.com/2020/07/announcing-scann-efficient...</a>
[2] <a href="https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)">https://github.com/facebookresearch/faiss/wiki/Fast-accumula...</a>
Slightly offtopic, but I'm currently working on a video similarity search tool, and the vectors I'm using are pretty big (the size of a vector is over 2M). This is quite different to the normal vector size of maybe 10k max.<p>Currently I'm using Annoy (mostly because it's what I've used before) but I am a bit worried that this is well outside what it has been designed for.<p>Has anyone got specific advice for things I should try? I've used FAISS previously but it seems to have the same design space.
I am interested in testing this in production, instead of faiss/mrpt.<p>> metric='cos', # Choose 'l2sq', 'haversine' or other metric, default = 'ip'<p>As a note, it is actually 'l2_sq' for the Python example.<p>> index.add(labels=np.arange(len(vectors)), vectors=vectors)<p>Adding to index appears to be very slow. Also labels are listed as an optional param but the Python SDK has them as required.<p>Do you have setup of params for 'brute force' approach (100% accuracy)?
In the vein of single-file databases, I've been enjoying DuckDB and am exploring Kùzu, both coming out of the database group at University of Waterloo. DuckDB aims to be a SQLite for analytics (OLAP), while Kùzu is an analytics focused graph database.<p><a href="https://duckdb.org/" rel="nofollow noreferrer">https://duckdb.org/</a>
<a href="https://kuzudb.com/" rel="nofollow noreferrer">https://kuzudb.com/</a>
The fact that USearch has a WASM binding for frontend use (AND supports serialization) is very cool for client-side search/LLM applications!<p>How would I integrate this into a dense passage retriever workflow for RAG? I could not find any examples for document chunk ingestion and similarity query.
Is view() for disk-based indexes doing something special over plain mmap(), e.g. setting read-aheads based on the knowledge of the intental structure to make it faster if done over the network?<p>Talking about <a href="https://github.com/unum-cloud/usearch#disk-based-indexes">https://github.com/unum-cloud/usearch#disk-based-indexes</a>
I'm curious, is HSNW the only option? Do you support IVF-style indexes? Also, FAISS is nice because it supports a pluggable storage layer. Is this something that's easily supported in USearch?<p>Great work, and thank you for your contributions.
In this page they have "space filling curves" as an example in one of the images, but I haven't been able to find production systems that actually use space filling curves for similarity search. Anyone have any tips?