TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Open-Source Colab Notebooks to Implement Advanced RAG Techniques

98 pointsby hbamoria6 months ago
Hey HN fam,<p>We’ve seen developers spend a lot of time implementing advanced RAG techniques from scratch.<p>While these techniques are essential for improving performance, their implementation requires a lot of effort and testing!<p>To help with this process, our team (Athina AI) has released Open-Source Advanced RAG Cookbooks.<p>This is a collection of ready-to-run Google Colab notebooks featuring the most commonly implemented techniques.<p>Please show us some love by starring the repo if you find this useful!

5 comments

Oras6 months ago
One of the challenges I have with RAG is excluding table of contents, headers&#x2F;footers and appendices from PDFs.<p>Is there a tool&#x2F;technique to achieve this? I’m aware that I can use LLMs to do so, or read all pages and find identical text (header&#x2F;footer), but I want to keep the page number as part of the metadata to ensure better citation on retrieval.
评论 #42316194 未加载
评论 #42316990 未加载
评论 #42317092 未加载
jonathan-adly6 months ago
I would strongly advise against people learning based on LangChain.<p>It is abstraction hell, and will set you back thousands of engineers hours the moment you want to do something differently.<p>RAG is actually very simple thing to do; just too much VC money in the space &amp; complexity merchants.<p>Best way to learn is outside of notebooks (the hard parts of RAG is all around the actual product), and use as little frameworks as possible.<p>My preferred stack is a FastAPI&#x2F;numpy&#x2F;redis. Simple as pie. You can swap redis for pgVector&#x2F;Postgres when ready for the next complexity step.
评论 #42318563 未加载
评论 #42318745 未加载
评论 #42317203 未加载
Jet_Xu6 months ago
Interesting discussion! While RAG is powerful for document retrieval, applying it to code repositories presents unique challenges that go beyond traditional RAG implementations. I&#x27;ve been working on a universal repository knowledge graph system, and found that the real complexity lies in handling cross-language semantic understanding and maintaining relationship context across different repo structures (mono&#x2F;poly).<p>Has anyone successfully implemented a language-agnostic approach that can: 1. Capture implicit code relationships without heavy LLM dependency? 2. Scale efficiently for large monorepos while preserving fine-grained semantic links? 3. Handle cross-module dependencies and version evolution?<p>Current solutions like AST-based analysis + traditional embeddings seem to miss crucial semantic contexts. Curious about others&#x27; experiences with hybrid approaches combining static analysis and lightweight ML models.
krawczstef6 months ago
+1 for vanilla code without LangChain.
评论 #42315097 未加载
评论 #42315398 未加载
评论 #42315003 未加载
dmezzetti6 months ago
Thanks for sharing.<p>If you want notebooks that do some of this with local open models: <a href="https:&#x2F;&#x2F;github.com&#x2F;neuml&#x2F;txtai&#x2F;tree&#x2F;master&#x2F;examples">https:&#x2F;&#x2F;github.com&#x2F;neuml&#x2F;txtai&#x2F;tree&#x2F;master&#x2F;examples</a> and here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;davidmezzetti" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;davidmezzetti</a>
评论 #42317999 未加载