I really dig more content about how vector databases/tools handle problems like this!<p>In sqlite-vec, there's only a flat brute-force index (though DiskANN/IVF will be coming soon). But we do have a concept of partition keys[0], which allows you to "internally shard" the vector index based on a user_id or any other value.<p><pre><code> create virtual table vec_documents using vec0(
document_id integer primary key,
user_id integer partition key,
contents_embedding float[1024]
)
</code></pre>
Then at query time, any WHERE constraints on a partition key are pushed-down, so only matching vectors are searched instead of the entire index.<p><pre><code> select
document_id,
user_id,
distance
from vec_documents
where contents_embedding match :query
and k = 20
and user_id = 123; -- only search documents from user 123
</code></pre>
Definitely not as performant as a proper vector index, but a lot of real-world application have these natural groups anyway. Like "search only English documents" or "search entries in this workspace only", or even "search comments only from the past 30 days."<p>Internally sqlite-vec stores vectors in "chunks", so when partition keys are definds, every chunk is associated with a unique combination of all partition keys. Kinda hard to describe, but if you create a vec0 virtual table and insert some values, you can inspect the internal "shadow tables" in the SQLite database to see how it's all stored.<p>[0] <a href="https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partition-keys" rel="nofollow">https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partiti...</a>