TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Open-Source Colab Notebooks to Implement Advanced RAG Techniques

98 点作者 hbamoria5 个月前
Hey HN fam,<p>We’ve seen developers spend a lot of time implementing advanced RAG techniques from scratch.<p>While these techniques are essential for improving performance, their implementation requires a lot of effort and testing!<p>To help with this process, our team (Athina AI) has released Open-Source Advanced RAG Cookbooks.<p>This is a collection of ready-to-run Google Colab notebooks featuring the most commonly implemented techniques.<p>Please show us some love by starring the repo if you find this useful!

5 条评论

Oras5 个月前
One of the challenges I have with RAG is excluding table of contents, headers&#x2F;footers and appendices from PDFs.<p>Is there a tool&#x2F;technique to achieve this? I’m aware that I can use LLMs to do so, or read all pages and find identical text (header&#x2F;footer), but I want to keep the page number as part of the metadata to ensure better citation on retrieval.
评论 #42316194 未加载
评论 #42316990 未加载
评论 #42317092 未加载
jonathan-adly5 个月前
I would strongly advise against people learning based on LangChain.<p>It is abstraction hell, and will set you back thousands of engineers hours the moment you want to do something differently.<p>RAG is actually very simple thing to do; just too much VC money in the space &amp; complexity merchants.<p>Best way to learn is outside of notebooks (the hard parts of RAG is all around the actual product), and use as little frameworks as possible.<p>My preferred stack is a FastAPI&#x2F;numpy&#x2F;redis. Simple as pie. You can swap redis for pgVector&#x2F;Postgres when ready for the next complexity step.
评论 #42318563 未加载
评论 #42318745 未加载
评论 #42317203 未加载
Jet_Xu5 个月前
Interesting discussion! While RAG is powerful for document retrieval, applying it to code repositories presents unique challenges that go beyond traditional RAG implementations. I&#x27;ve been working on a universal repository knowledge graph system, and found that the real complexity lies in handling cross-language semantic understanding and maintaining relationship context across different repo structures (mono&#x2F;poly).<p>Has anyone successfully implemented a language-agnostic approach that can: 1. Capture implicit code relationships without heavy LLM dependency? 2. Scale efficiently for large monorepos while preserving fine-grained semantic links? 3. Handle cross-module dependencies and version evolution?<p>Current solutions like AST-based analysis + traditional embeddings seem to miss crucial semantic contexts. Curious about others&#x27; experiences with hybrid approaches combining static analysis and lightweight ML models.
krawczstef5 个月前
+1 for vanilla code without LangChain.
评论 #42315097 未加载
评论 #42315398 未加载
评论 #42315003 未加载
dmezzetti5 个月前
Thanks for sharing.<p>If you want notebooks that do some of this with local open models: <a href="https:&#x2F;&#x2F;github.com&#x2F;neuml&#x2F;txtai&#x2F;tree&#x2F;master&#x2F;examples">https:&#x2F;&#x2F;github.com&#x2F;neuml&#x2F;txtai&#x2F;tree&#x2F;master&#x2F;examples</a> and here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;davidmezzetti" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;davidmezzetti</a>
评论 #42317999 未加载