Thanks for sharing. How does this compare with DiskANN (<a href="https://zilliz.com/blog/diskann-a-disk-based-anns-solution-with-high-recall-and-high-qps-on-billion-scale-dataset" rel="nofollow noreferrer">https://zilliz.com/blog/diskann-a-disk-based-anns-solution-w...</a>) or HNSW-IF (<a href="https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/" rel="nofollow noreferrer">https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-sear...</a>)?
I would appreciate a rough comparison with usearch:<p><a href="https://unum-cloud.github.io/usearch/" rel="nofollow noreferrer">https://unum-cloud.github.io/usearch/</a><p>Which was also recently on HN: <a href="https://news.ycombinator.com/item?id=36942993">https://news.ycombinator.com/item?id=36942993</a>
CEO of Neon here. After we built an in memory HNSW index for Postgres that allowed us to establish a baseline in performance and prove that it's the right approach to support vector search we now built it "the right way" and now it support restarts of Postgres, replication and the rest of the Postgres machinery.
This looks very cool.<p>I'm interested in how many vectors are indexed/how large the index is that corresponds to the latency chart? If we have an in-memory HNSW index of 10M vectors at ~20GB (512 dim), say, what are the RAM requirements when using the disk-based version?
Forgive me I'm not super familiar with the vector indexes outside of the basic tsvector for text search.<p>What's the difference between pg_embedding, pg_vector, and tsvector? Are they compariable/interchangable? And how do you know which one to pick?<p>My understanding is pg_vector has poorer performance compared to some dedicated vector databases, does pg_embedding perform better?<p>Sorry if these are silly questions.