科技回声

4 条评论

I really dig more content about how vector databases/tools handle problems like this!In sqlite-vec, there's only a flat brute-force index (though DiskANN/IVF will be coming soon). But we do have a concept of partition keys[0], which allows you to "internally shard" the vector index based on a user_id or any other value.<pre><code> create virtual table vec_documents using vec0( document_id integer primary key, user_id integer partition key, contents_embedding float[1024] ) </code></pre> Then at query time, any WHERE constraints on a partition key are pushed-down, so only matching vectors are searched instead of the entire index.<pre><code> select document_id, user_id, distance from vec_documents where contents_embedding match :query and k = 20 and user_id = 123; -- only search documents from user 123 </code></pre> Definitely not as performant as a proper vector index, but a lot of real-world application have these natural groups anyway. Like "search only English documents" or "search entries in this workspace only", or even "search comments only from the past 30 days."Internally sqlite-vec stores vectors in "chunks", so when partition keys are definds, every chunk is associated with a unique combination of all partition keys. Kinda hard to describe, but if you create a vec0 virtual table and insert some values, you can inspect the internal "shadow tables" in the SQLite database to see how it's all stored.[0] <a href="https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partition-keys" rel="nofollow">https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partiti...</a>

评论 #43489943 未加载

jeffchuber大约 2 个月前

This will work very poorly when your data is changing because the centroids degrade and you'll have very poor recall but likely not know it unless you are also monitoring recall.I didn't see this in the write-up, so adding it here as a common foot gun.

评论 #43485771 未加载

redskyluan大约 2 个月前

sharding is a bad solution for any databases, especially vector database. See <a href="https://milvus.io/blog/why-manual-sharding-is-a-bad-idea-for-vector-databases-and-how-to-fix-it.md" rel="nofollow">https://milvus.io/blog/why-manual-sharding-is-a-bad-idea-for...</a>

Sharding Pgvector

4 条评论

Sharding Pgvector

4 条评论