TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LLMs, RAG, and the missing storage layer for AI

151 pointsby yurisagalovover 1 year ago

11 comments

panarkyover 1 year ago
The first unstated assumption is that similar vectors are relevant documents, and for many use cases that&#x27;s just not true. Cosine similarity != relevance. So if your pipeline pulls 2 or 4 or 12 document chunks into the LLM&#x27;s context, and half or more of them aren&#x27;t relevant, does this make the LLM&#x27;s response more or less relevant?<p>The second unstated assumption is that the vector index can accurately identify the top K vectors by cosine similarity, and that&#x27;s not true either. If you retrieve the top K vectors according to the vector index (instead of computing all the pairwise similarities in advance), that set of 10 vectors will be missing documents that have a higher cosine similarity than that of the K&#x27;th vector retrieved.<p>All of this means you&#x27;ll need to retrieve a multiple of K vectors, figure out some way to re-rank them to exclude the irrelevant ones, and have your own ground truth to measure the index&#x27;s precision and recall.
评论 #37428804 未加载
评论 #37428797 未加载
评论 #37428581 未加载
评论 #37430219 未加载
评论 #37430189 未加载
ianpurtonover 1 year ago
As an architect working on LLM applications I have these criteria for a database.<p>- Full SQL support<p>- Has good tooling around migrations (i.e. dbmate)<p>- Good support for running in Kubernetes or in the cloud<p>- Well understood by operations i.e. backups and scaling<p>- Supports vectors and similarity search.<p>- Well supported client libraries<p>So basically Postgres and PgVector.
评论 #37431072 未加载
评论 #37432248 未加载
评论 #37432756 未加载
评论 #37439297 未加载
freedmandover 1 year ago
I don’t fully understand the fascination with retrieval augmented generation. The retrieval part is already really good and computationally inexpensive — why not just pass the semantic search results to the user in a pleasant interface and allow them to synthesize their own response? Reading a generated paragraph that obscures the full sourcing seems like a practice that’s been popularized to justify using the shiny new tech, but is the generated part what users actually want? (Not to mention there is no bulletproof way to prevent hallucinations, lies, and prompt injection even with retrieval context.)
评论 #37430152 未加载
评论 #37430874 未加载
评论 #37430017 未加载
评论 #37431368 未加载
评论 #37429957 未加载
评论 #37430373 未加载
评论 #37432584 未加载
jamesblondeover 1 year ago
It&#x27;s not clear to me that only a vector DB should be used for RAG. Vector DBs give you stochastic responses.<p>For customer chatbots, it seems that structured data - from an operational database or a feature store adds more value. If the user asks about an order they made or a product they have a question about, you use the user-id (when logged in) to retrieve all info about what the user bought recently - the LLM will figure out what the prompt is referring to.<p>Reference:<p><a href="https:&#x2F;&#x2F;www.hopsworks.ai&#x2F;dictionary&#x2F;retrieval-augmented-llm" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.hopsworks.ai&#x2F;dictionary&#x2F;retrieval-augmented-llm</a>
评论 #37428648 未加载
评论 #37433272 未加载
Charon77over 1 year ago
A lot of things mentioned are too handwaved and not explained well.<p>It&#x27;s not explained how vector DB is going to help while incumbents like chatgpt4 can already call functions and do API calls.<p>It doesn&#x27;t make AI less black box, it&#x27;s irrelevant and not explained..<p>There&#x27;s already existing ways to fine tune models without expensive hardwares such as using LoRA to inject small layers with customized training data, which trains in fractions of the time and resource needed to retrain the model
评论 #37428949 未加载
juxtaposicionover 1 year ago
We use Lance extensively at my startup. This blog post (previously on HN) details nicely why: <a href="https:&#x2F;&#x2F;thedataquarry.com&#x2F;posts&#x2F;vector-db-4&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;thedataquarry.com&#x2F;posts&#x2F;vector-db-4&#x2F;</a> but essentially it’s because Lance is a “just a file” in the same way SQLite is a “just a file” which makes it embedded and serverless and straightforward to use locally or in a deployment.
zwapsover 1 year ago
I find it quite comical to speak of a &quot;missing storage layer&quot; during your own self-promotion, considering that the market for vector databases is literally overflowing right now.<p>Everything else may be missing, but not the storage layer.
saaaaaamover 1 year ago
Does ChatGPT always start articles with “in the rapidly evolving landscape of X”?<p>Surely if you’re posting an article promoting miraculous AI tech you should human edit the article summary so that it’s not <i>really obviously</i> drafted by AI.<p>Or just use the prompt “tone your writing down and please remember that you’re not writing for a high school student who is impressed by nonsensical hyperbole”. I’ve started using this prompt and it works astonishingly well in the fast evolving landscape of directionless content creation.
ameliusover 1 year ago
Unrelated question: is there a standard way for writing down neural network diagrams? I&#x27;m thinking of how it is done in electrical circuit schematics, which capture all relevant information in a single diagram, in a (mostly) standardized way.<p>I&#x27;ve seen the diagrams in DL papers etc. but I guess everyone invents their own conventions, and the diagrams often don&#x27;t convey the complete flow of information.
评论 #37431824 未加载
eth0palover 1 year ago
Shameless self promotion
dr_dshivover 1 year ago
404