We're building LintDB to productionize late interaction retrieval.<p>If you're not familiar with late interaction, here's the colBERTv2 paper: <a href="https://arxiv.org/pdf/2112.01488.pdf" rel="nofollow">https://arxiv.org/pdf/2112.01488.pdf</a><p>We built LintDB because we had a hard time getting the retrieval performance we wanted in our RAG application. colBERT calculates a document's score based on the maximum similarity across multiple vectors per document, and this was more successful for our task. The only problem was that traditional vector databases didn't support it.<p>Vespa supports this type of indexing, actually. But the learning curve was high and I was concerned about how we'd operationalize it.<p>Some of the features of LintDB:
- Embeddable. You can import LintDB as a Python package and get started immediately.<p>- Bit level compression. We fully support PLAID and compressing token embeddings into bits. 128 dimension vectors can compress down to 16 bytes.<p>- Built on top of rocksDB for fast on-disk access. This also lets us take advantage of merging rocksdb indices. Index your data across multiple machines and merge them together for serving.<p>- Multi-tenant support.<p>We're seeing new research on bitvector searching and late interaction, and we want to be the first ones to support it. Our roadmap also includes adding explainability to the results, which is possible thanks to late interaction.<p>repo: <a href="https://github.com/DeployQL/LintDB">https://github.com/DeployQL/LintDB</a><p>documentation: <a href="https://deployql.github.io/LintDB/" rel="nofollow">https://deployql.github.io/LintDB/</a>