TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

USearch: Smaller and faster single-file vector search engine

202 pointsby 0xedbalmost 2 years ago

13 comments

twelfthnightalmost 2 years ago
Are folks typically using HNSW for vector search these days? I thought maybe ScaNN has proven to be better? Especially since it&#x27;s available in FAISS [2].<p>[1] <a href="https:&#x2F;&#x2F;ai.googleblog.com&#x2F;2020&#x2F;07&#x2F;announcing-scann-efficient-vector.html?m=1" rel="nofollow noreferrer">https:&#x2F;&#x2F;ai.googleblog.com&#x2F;2020&#x2F;07&#x2F;announcing-scann-efficient...</a> [2] <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;faiss&#x2F;wiki&#x2F;Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;faiss&#x2F;wiki&#x2F;Fast-accumula...</a>
评论 #36948329 未加载
评论 #36948597 未加载
评论 #36952921 未加载
nlalmost 2 years ago
Slightly offtopic, but I&#x27;m currently working on a video similarity search tool, and the vectors I&#x27;m using are pretty big (the size of a vector is over 2M). This is quite different to the normal vector size of maybe 10k max.<p>Currently I&#x27;m using Annoy (mostly because it&#x27;s what I&#x27;ve used before) but I am a bit worried that this is well outside what it has been designed for.<p>Has anyone got specific advice for things I should try? I&#x27;ve used FAISS previously but it seems to have the same design space.
评论 #36953587 未加载
评论 #36953799 未加载
评论 #36952515 未加载
评论 #36952379 未加载
评论 #36952261 未加载
freediveralmost 2 years ago
I am interested in testing this in production, instead of faiss&#x2F;mrpt.<p>&gt; metric=&#x27;cos&#x27;, # Choose &#x27;l2sq&#x27;, &#x27;haversine&#x27; or other metric, default = &#x27;ip&#x27;<p>As a note, it is actually &#x27;l2_sq&#x27; for the Python example.<p>&gt; index.add(labels=np.arange(len(vectors)), vectors=vectors)<p>Adding to index appears to be very slow. Also labels are listed as an optional param but the Python SDK has them as required.<p>Do you have setup of params for &#x27;brute force&#x27; approach (100% accuracy)?
评论 #36952300 未加载
eitan-turokalmost 2 years ago
This looks like a great package. Many vector-search engines do not allow you to implement your own custom distance metrics. But Unum does. Love it!
评论 #36947608 未加载
adultSwimalmost 2 years ago
In the vein of single-file databases, I&#x27;ve been enjoying DuckDB and am exploring Kùzu, both coming out of the database group at University of Waterloo. DuckDB aims to be a SQLite for analytics (OLAP), while Kùzu is an analytics focused graph database.<p><a href="https:&#x2F;&#x2F;duckdb.org&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;duckdb.org&#x2F;</a> <a href="https:&#x2F;&#x2F;kuzudb.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;kuzudb.com&#x2F;</a>
CharlesWalmost 2 years ago
@ashvardanian, what are reasons a developer would choose this over sqlite-vss?
评论 #36952337 未加载
ukuinaalmost 2 years ago
The fact that USearch has a WASM binding for frontend use (AND supports serialization) is very cool for client-side search&#x2F;LLM applications!<p>How would I integrate this into a dense passage retriever workflow for RAG? I could not find any examples for document chunk ingestion and similarity query.
nh2almost 2 years ago
Is view() for disk-based indexes doing something special over plain mmap(), e.g. setting read-aheads based on the knowledge of the intental structure to make it faster if done over the network?<p>Talking about <a href="https:&#x2F;&#x2F;github.com&#x2F;unum-cloud&#x2F;usearch#disk-based-indexes">https:&#x2F;&#x2F;github.com&#x2F;unum-cloud&#x2F;usearch#disk-based-indexes</a>
评论 #36954742 未加载
moabalmost 2 years ago
Do you have plans to support metadata filtering?
评论 #36949618 未加载
svcrunchalmost 2 years ago
I&#x27;m curious, is HSNW the only option? Do you support IVF-style indexes? Also, FAISS is nice because it supports a pluggable storage layer. Is this something that&#x27;s easily supported in USearch?<p>Great work, and thank you for your contributions.
j2kunalmost 2 years ago
In this page they have &quot;space filling curves&quot; as an example in one of the images, but I haven&#x27;t been able to find production systems that actually use space filling curves for similarity search. Anyone have any tips?
评论 #36952609 未加载
KRAKRISMOTTalmost 2 years ago
What&#x27;s performance like without BLAS acceleration?
评论 #36948190 未加载
ykadowakalmost 2 years ago
@ashvardanian any plan to put it on ANN benchmarks?
评论 #36953357 未加载