The first unstated assumption is that similar vectors are relevant documents, and for many use cases that's just not true. Cosine similarity != relevance. So if your pipeline pulls 2 or 4 or 12 document chunks into the LLM's context, and half or more of them aren't relevant, does this make the LLM's response more or less relevant?<p>The second unstated assumption is that the vector index can accurately identify the top K vectors by cosine similarity, and that's not true either. If you retrieve the top K vectors according to the vector index (instead of computing all the pairwise similarities in advance), that set of 10 vectors will be missing documents that have a higher cosine similarity than that of the K'th vector retrieved.<p>All of this means you'll need to retrieve a multiple of K vectors, figure out some way to re-rank them to exclude the irrelevant ones, and have your own ground truth to measure the index's precision and recall.
As an architect working on LLM applications I have these criteria for a database.<p>- Full SQL support<p>- Has good tooling around migrations (i.e. dbmate)<p>- Good support for running in Kubernetes or in the cloud<p>- Well understood by operations i.e. backups and scaling<p>- Supports vectors and similarity search.<p>- Well supported client libraries<p>So basically Postgres and PgVector.
I don’t fully understand the fascination with retrieval augmented generation. The retrieval part is already really good and computationally inexpensive — why not just pass the semantic search results to the user in a pleasant interface and allow them to synthesize their own response? Reading a generated paragraph that obscures the full sourcing seems like a practice that’s been popularized to justify using the shiny new tech, but is the generated part what users actually want? (Not to mention there is no bulletproof way to prevent hallucinations, lies, and prompt injection even with retrieval context.)
It's not clear to me that only a vector DB should be used for RAG.
Vector DBs give you stochastic responses.<p>For customer chatbots, it seems that structured data - from an operational database or a feature store adds more value. If the user asks about an order they made or a product they have a question about, you use the user-id (when logged in) to retrieve all info about what the user bought recently - the LLM will figure out what the prompt is referring to.<p>Reference:<p><a href="https://www.hopsworks.ai/dictionary/retrieval-augmented-llm" rel="nofollow noreferrer">https://www.hopsworks.ai/dictionary/retrieval-augmented-llm</a>
A lot of things mentioned are too handwaved and not explained well.<p>It's not explained how vector DB is going to help while incumbents like chatgpt4 can already call functions and do API calls.<p>It doesn't make AI less black box, it's irrelevant and not explained..<p>There's already existing ways to fine tune models without expensive hardwares such as using LoRA to inject small layers with customized training data, which trains in fractions of the time and resource needed to retrain the model
We use Lance extensively at my startup. This blog post (previously on HN) details nicely why: <a href="https://thedataquarry.com/posts/vector-db-4/" rel="nofollow noreferrer">https://thedataquarry.com/posts/vector-db-4/</a> but essentially it’s because Lance is a “just a file” in the same way SQLite is a “just a file” which makes it embedded and serverless and straightforward to use locally or in a deployment.
I find it quite comical to speak of a "missing storage layer" during your own self-promotion, considering that the market for vector databases is literally overflowing right now.<p>Everything else may be missing, but not the storage layer.
Does ChatGPT always start articles with “in the rapidly evolving landscape of X”?<p>Surely if you’re posting an article promoting miraculous AI tech you should human edit the article summary so that it’s not <i>really obviously</i> drafted by AI.<p>Or just use the prompt “tone your writing down and please remember that you’re not writing for a high school student who is impressed by nonsensical hyperbole”. I’ve started using this prompt and it works astonishingly well in the fast evolving landscape of directionless content creation.
Unrelated question: is there a standard way for writing down neural network diagrams? I'm thinking of how it is done in electrical circuit schematics, which capture all relevant information in a single diagram, in a (mostly) standardized way.<p>I've seen the diagrams in DL papers etc. but I guess everyone invents their own conventions, and the diagrams often don't convey the complete flow of information.