TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Why Try RAG on the Client Side? Is Local-First RAG Practical?

4 点作者 MutedEstate454 个月前
Came across MeMemo, which runs Retrieval-Augmented Generation (RAG) directly on the client side using WebGPU. No server calls, full privacy, and faster responses—sounds great, but I’m wondering:<p>What’s the catch? How do you scale vector search or manage embeddings locally? Can this handle complex use cases, or is it mostly for lightweight tasks? How do we navigate browser limitations, UX challenges, or security concerns in a setup like this?<p>Curious if anyone here has tried client-side RAG or sees a compelling use case for it. Is this approach worth exploring for privacy-focused apps, or are we not there yet?

1 comment

softwaredoug4 个月前
You can do pretty fast (single digit ms) cosine similarity on 1m vectors with numpy on the CPU. And there are small embedded databases like RocksDB with HNSW indices that work well beyond that. Especially staying at like 256 dims. Quantization can shrink even further.<p>Also locally you’re QPS is very low as you’re the only searcher.<p>So with enough RAM, and small enough dataset, it should be fine.
评论 #42837586 未加载