While building semantic search and LLM-powered apps, one needs to try out various vector databases because they differ widely in feature set, cost and other characteristics.<p>The Vector-io library introduces a universal open format for storing vector datasets (Vectors, along with their metadata), along with import and export scripts for a wide range of vector databases:
- Pinecone
- Qdrant
- Milvus
- GCP Vertex AI Vector Search
- KDB.AI
- LanceDB
- DataStax Astra DB
- Chroma
- Turbopuffer<p>This will allow easier backup, snapshots, sharing of vector datasets and managing data across different vector DBs.<p>I'm also curating a list of publicly available datasets in this format, which can be loaded directly from HuggingFace into your favorite VectorDB: <a href="https://huggingface.co/collections/aintech/vector-io-compatible-datasets-65d7256c837f8980e1ed13fc" rel="nofollow">https://huggingface.co/collections/aintech/vector-io-compati...</a><p>If you have data in a vector DB, please try it out and let me know if you have feedback. Thanks!