This title was a little misleading to me IMO because (maybe my skill issue) I associated "inferencing" with "generation".<p>After reading the article, it seems Pinecone just now supports in-DB vectorization, a feature that is shared by:<p>- DataStax Astra DB: <a href="https://www.datastax.com/blog/simplifying-vector-embedding-generation-with-astra-vectorize" rel="nofollow">https://www.datastax.com/blog/simplifying-vector-embedding-g...</a> (since May 2024)<p>- Weaviate: <a href="https://weaviate.io/blog/introducing-weaviate-embeddings" rel="nofollow">https://weaviate.io/blog/introducing-weaviate-embeddings</a> (as of yesterday)
This post has some more technical info: <a href="https://www.pinecone.io/blog/integrated-inference/" rel="nofollow">https://www.pinecone.io/blog/integrated-inference/</a><p>Makes a lot of sense to me to combine embedding, retrieval and reranking — I can imagine this being a way that they can differentiate themselves from the popular databases that have added support for vector search
Can someone please explain how this works?<p>I assumed that a specific flavour of LLM was needed, an “embedding model” to generate the vectors. Is this announcement that pinecone is adding their own?<p>Is it better or worse than the models here: <a href="https://ollama.com/search?c=embedding">https://ollama.com/search?c=embedding</a> For example?
Nothing new, Marqo has been doing this for a while now with their all in one platform to train, embed, retrieve, and evaluate.<p>I've played around with Weaviate & Astra DB but Marqo is the best and easiest solution imo.
txtai (<a href="https://github.com/neuml/txtai">https://github.com/neuml/txtai</a>) has had inline vectorization since 2020. It supports Transformers, llama.cpp and LLM API services. It also has inline integration with LLM models and a built-in RAG pipeline.