TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to manage docs for LLM RAG app?

6 点作者 Jianghong949 个月前
I&#x27;m building a LLM RAG QA bot for my company, a financial institution. Right now I know the &#x27;basic&#x27; building blocks, e.g. prompt engineering, RAG, vector db, eval, etc. Funny enough the first challenge I encounter is to curate and manage all types of docs, e.g.: * email chains * teams recording transcripts * confluence pages * pdf manuscripts<p>These can be ever-evolving and may hook up with periodic delta updates, manual sync, add&#x2F;remove, etc. And I&#x27;m trying to figure out if there&#x27;s a way to manage these docs&#x2F;texts properly. Basically, I think I would need a system to store these files, their metadata, etc, and provide a web UI for people to manage them. Then these blob of texts will go through frameworks like langchain&#x2F;LlamaIndex and be cleaned&#x2F;chunked into vector db, and different chunking strategies can be A&#x2F;B tested while other people maintain this ever-growing docs system.<p>Any suggestions are welcomed. I&#x27;ve tried some all-in-one frameworks but so far my experience are lackluster. Also, my company due to compliance constraints cannot use cloud-based solutions, so it has to be either open-source local-deployed, or developed locally.

1 comment

omidh9 个月前
Did you try dify? I found it was a good beginning for me.<p><a href="https:&#x2F;&#x2F;dify.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;dify.ai&#x2F;</a>
评论 #41198361 未加载
评论 #41191951 未加载