Show HN: How we leapfrogged traditional vector based RAG with a 'language map'

162 点作者 oshams10 个月前

TL;DR: Vector-based RAG performs poorly for many real-world applications like codebase chats, and you should consider 'language maps'.Part of our mission at Mutable.ai is to make it much easier for developers to build and understand software. One of the natural ways to do this is to create a codebase chat, that answer questions about your repo and help you build features.It might seem simple to plug in your codebase into a state-of-the-art LLM, but LLMs have two limitations that make human-level assistance with code difficult:1. They currently have context windows that are too small to accommodate most codebases, let alone your entire organization's codebases.2. They need to reason immediately to answer any questions without thinking through the answer "step-by-step."We built a chat sometime a year ago based on keyword retrieval and vector embeddings. No matter how hard we tried, including training our own dedicated embedding model, we could not get the chat to get us good performance.Here is a typical example: <a href="https://x.com/mutableai/article/1813815706783490055/media/1813813912472870913" rel="nofollow">https://x.com/mutableai/article/1813815706783490055/media/18...</a>If you ask how to do quantization in llama.cpp the answers were oddly specific and seemed to pull in the wrong context consistently, especially from tests. We could, of course, take countermeasures, but it felt like a losing battle.So we went back to step 1, let’s understand the code, let’s do our homework, and for us, that meant actually putting an understanding of the codebase down in a document — a Wikipedia-style article — called Auto Wiki. The wiki features diagrams and citations to your codebase. Example: <a href="https://wiki.mutable.ai/ggerganov/llama.cpp">https://wiki.mutable.ai/ggerganov/llama.cpp</a>This wiki is useful in and of itself for onboarding and understanding the business logic of a codebase, but one of the hopes for constructing such a document was that we’d be able to circumvent traditional keyword and vector-based RAG approaches.It turns out using a wiki to find context for an LLM overcomes many of the weaknesses of our previous approach, while still scaling to arbitrarily large codebases:1. Instead of context retrieval through vectors or keywords, the context is retrieved by looking at the sources that the wiki cites. 2. The answers are based both on the section(s) of the wiki that are relevant AND the content of the actual code that we put into memory — this functions as a “language map” of the codebase.See it in action below for the same query as our old codebase chat:<a href="https://x.com/mutableai/article/1813815706783490055/media/1813814321144844288" rel="nofollow">https://x.com/mutableai/article/1813815706783490055/media/18...</a><a href="https://x.com/mutableai/article/1813815706783490055/media/1813814363939315712" rel="nofollow">https://x.com/mutableai/article/1813815706783490055/media/18...</a>The answer cites it sources in both the wiki and the actual code and gives a step by step guide to doing quantization with example code.The quality of the answer is dramatically improved - it is more accurate, relevant, and comprehensive.It turns out language models love being given language and not a bunch of text snippets that are nearby in vector space or that have certain keywords! We find strong performance consistently across codebases of all sizes. The results from the chat are so good they even surprised us a little bit - you should check it out on a codebase of your own, at <a href="https://wiki.mutable.ai">https://wiki.mutable.ai</a>, which we are happy to do for free for open source code, and starts at just $2/mo/repo for private repos.We are introducing evals demonstrating how much better our chat is with this approach, but were so happy with the results we wanted to share with the whole community.Thank you!

16 条评论

anotherpaulg10 个月前

I agree that many AI coding tools have rushed to adopt naive RAG on code.Have you done any quantitative evaluation of your wiki style code summaries? My first impression is that they might be too wordy and not deliver valuable context in a token efficient way.Aider uses a repository map [0] to deliver code context. Relevant code is identified using a graph optimization on the repository's AST & call graph, not vector similarity as is typical with RAG. The repo map shows the selected code within its AST context.Aider currently holds the 2nd highest score on the main SWE Bench [1], without doing any code RAG. So there is some evidence that the repo map is effective at helping the LLM understand large code bases.[0] <a href="https://aider.chat/docs/repomap.html" rel="nofollow">https://aider.chat/docs/repomap.html</a>[1] <a href="https://aider.chat/2024/06/02/main-swe-bench.html" rel="nofollow">https://aider.chat/2024/06/02/main-swe-bench.html</a>

评论 #41002895 未加载

评论 #41005637 未加载

评论 #41007018 未加载

lmeyerov10 个月前

I've been curious about this use case, so cool to see, and more so, to know it worked!This is essentially a realization of how graph RAG flavor systems work under the hood. Basically you create hierarchical summary indexes, such as topical cross-document ones, and tune the summaries to your domain. At retrieval time, one question will be able to leverage richer multi-hop concepts that span ideas that are individually distinct & lexically, but get used together. Smarter retrievers can choose to dynamically expand on this (agentic: 'follow links') or work more in bulk on the digests ('map/reduce over summaries') without having to run every chunk through the LLM.Once you understand what is going on in core graph rag, you can even add non-vector relationships to the indexing and retrieval steps, such as from a static code analysis, which afaict is the idea here. For a given domain, likewise, you can do custom templates to tune what is in each summary, like different wiki page styles for different topics. (Note: despite the name & vendor advertising, no graph DB nor knowledge graph is needed for graph RAG, which makes its relationship to autowiki etc concepts less obvious.)We are building out some tech here to deal with core production issues like to update/add items without reindexing everything and making larger ingests faster+cheaper. Eg, imagine monitoring a heavy feed or quickly changing repo. If of interest to any, please ping - we are putting together design partner cohorts for the RAG phase of louie.ai .

评论 #41007256 未加载

kordlessagain10 个月前

I've been working on Webwright[1] for a month after having prototyped a few different terminal solutions for a coding agents. Webwright manifests in a psuedo terminal in Powershell or terminal on MacOS.Using Claude.AI, I determined the `ast` package would be suitable (for Python scanning), so had Webwright author a new function module to scan the project and assemble a list of functions, function calls, imports, and decorators. I then installed the function module and relaunched the app.It produced the following as a request to explain how the system works for y'all:WebWright uses modular function calls to handle various tasks such as user input processing, file operations, git management, and AI interactions. The main application routes commands to specific functions, which in turn perform discrete tasks like reading files, managing containers, and querying AI APIs. This architecture ensures clear separation of concerns, maintainability, and scalability while efficiently handling complex operations asynchronously.The heavy lifting in WebWright is primarily done in the AI handlers and utility modules:1. AI Handlers (`lib.aifunc`):<pre><code> - The core AI functionality, like processing user queries and interacting with AI APIs (OpenAI, Anthropic), performs the most computationally intensive tasks. - Managing asynchronous API calls, error handling, and processing large datasets. </code></pre> 2. Utility Modules:<pre><code> - Modules like lib.functions.filesystem, lib.functions.git_*, and lib.functions.manage_app_container handle substantial operations such as file I/O, version control, and Docker container management. </code></pre> These components shoulder the bulk of the computational and I/O work, ensuring efficient task execution and resource management.[1] <a href="https://github.com/MittaAI/webwright">https://github.com/MittaAI/webwright</a>

评论 #41007230 未加载

tlarkworthy10 个月前

This is literate programming! Why not just put the codebase in the wiki and not have two representations diverging. Why can't we have diagrams and links in code??? We can, like <a href="https://observablehq.com" rel="nofollow">https://observablehq.com</a> notebooks, it's a better representation for understanding.

评论 #41007247 未加载

maxsieg10 个月前

Hey, cofounder at Mutable.ai here.I want to encourage you all to ask the chat some tough questions. You can ask very complex and general questions. Some examples: - Ask ollama (<a href="https://wiki.mutable.ai/ollama/ollama">https://wiki.mutable.ai/ollama/ollama</a>) how to add a new model - Ask langchain (<a href="https://wiki.mutable.ai/langchain-ai/langchain">https://wiki.mutable.ai/langchain-ai/langchain</a>) "How can I build a simple personal assistant using this repo?" - Ask flash attention (<a href="https://wiki.mutable.ai/Dao-AILab/flash-attention">https://wiki.mutable.ai/Dao-AILab/flash-attention</a>) "What are the main benefits of using this code?"It is also useful for search - for example, if you ask langchain "Where is the code that connects to vector databases?" it will surface all the relevant information.Very curious to hear what you ask (and whether you find the response helpful)!

评论 #41014885 未加载

senko10 个月前

Looks similar to what we're doing in Pythagora with CodeMonkey agent (prompt: <a href="https://github.com/Pythagora-io/gpt-pilot/blob/main/core/prompts/code-monkey/describe_file.prompt">https://github.com/Pythagora-io/gpt-pilot/blob/main/core/pro...</a>, code: <a href="https://github.com/Pythagora-io/gpt-pilot/blob/main/core/agents/code_monkey.py#L91">https://github.com/Pythagora-io/gpt-pilot/blob/main/core/age...</a>)I think everyone who's seriously tackled the "code RAG" problem is aware a naive vector approach doesn't work, and some hybrid approach is needed (see also Paul's comments on Aider).Intuitively, I expect a combo of lsp/treesitter directed by LLM + vector-RAG over "wiki" / metadata would be a viable approach.Very exciting to see all the research into this!

langcss10 个月前

This sort of approach always made more sense to me than RAG. I am less likely to try RAG than something that feeds the LLM what it actually needs. RAG is risky in providing piecemeal information that confuses the LLM.The way I thought would work and like to try out is ask the LLM what info it wants next from an index of contents. Like a book. That index can be LLM generated or not. Then backtrack as you don't need that lookup in your dialogue any more and insert the result.It won't work for everything but should work for many "small expert" cases and you then don't need a vector DB you just do prompts!Cheap LLMs make this more viable perhaps than it used to be. Use an open source small LLM for the decision making then a quality open source or proprietary LLM for the chat or code gen.

评论 #41002613 未加载

评论 #41007341 未加载

maxsieg10 个月前

Hey! I'm a cofounder at Mutable.ai, FYI we have run this on the following codebases (plus hundreds more):<a href="https://wiki.mutable.ai/hashicorp/terraform">https://wiki.mutable.ai/hashicorp/terraform</a><a href="https://wiki.mutable.ai/ggerganov/llama.cpp">https://wiki.mutable.ai/ggerganov/llama.cpp</a><a href="https://wiki.mutable.ai/ethereum/go-ethereum">https://wiki.mutable.ai/ethereum/go-ethereum</a><a href="https://wiki.mutable.ai/NVIDIA/TensorRT">https://wiki.mutable.ai/NVIDIA/TensorRT</a><a href="https://wiki.mutable.ai/langchain-ai/langchain">https://wiki.mutable.ai/langchain-ai/langchain</a><a href="https://wiki.mutable.ai/ollama/ollama">https://wiki.mutable.ai/ollama/ollama</a><a href="https://wiki.mutable.ai/tensorflow/models">https://wiki.mutable.ai/tensorflow/models</a><a href="https://wiki.mutable.ai/grafana/grafana">https://wiki.mutable.ai/grafana/grafana</a><a href="https://wiki.mutable.ai/OpenAutoCoder/Agentless">https://wiki.mutable.ai/OpenAutoCoder/Agentless</a><a href="https://wiki.mutable.ai/unslothai/unsloth">https://wiki.mutable.ai/unslothai/unsloth</a><a href="https://wiki.mutable.ai/Dao-AILab/flash-attention">https://wiki.mutable.ai/Dao-AILab/flash-attention</a><a href="https://wiki.mutable.ai/vercel/next.js">https://wiki.mutable.ai/vercel/next.js</a><a href="https://wiki.mutable.ai/microsoft/vscode">https://wiki.mutable.ai/microsoft/vscode</a><a href="https://wiki.mutable.ai/wasm3/wasm3">https://wiki.mutable.ai/wasm3/wasm3</a><a href="https://wiki.mutable.ai/deepfakes/faceswap">https://wiki.mutable.ai/deepfakes/faceswap</a><a href="https://wiki.mutable.ai/huggingface/transformers">https://wiki.mutable.ai/huggingface/transformers</a><a href="https://wiki.mutable.ai/vllm-project/vllm">https://wiki.mutable.ai/vllm-project/vllm</a>

kleneway110 个月前

Nice job on this, it’s a really interesting approach. I’ve been developing an open-source coding agent over the past year, and RAG just wasn’t working at all. I switched to a repo map approach (which sounds similar to what aider is doing) and that helped a bit but still wasn’t great.However, a few weeks ago I built an agent that takes in a new GitHub issue and is given a variety of tools to do research on the background information to complete the issue. The tools include internet searches or clarifying questions to ask the person who wrote the ticket. But the most useful tool is the ability to look at the codebase and create a detailed markdown file of various files, explanations of what each file does, relevant code samples or snippets from the files, etc..It’s still early, but anecdotally I’ve seen a huge increase in the quality of the code that uses this research as part of the context (along with the repo map and other details). It’s also able to tackle much more complex issues than it could before.I definitely think you’re on to something here with this wiki approach. I’ll be curious to dig in and see the details of how you are creating these. Here is my research code if you’re interested: <a href="https://github.com/jacob-ai-bot/jacob/blob/feature/agent/src/server/agent/research.ts">https://github.com/jacob-ai-bot/jacob/blob/feature/agent/src...</a>And here’s an example of the research output (everything past the exit criteria section): <a href="https://github.com/kleneway/jacob/issues/62">https://github.com/kleneway/jacob/issues/62</a>

oshams10 个月前

Wow, I didn't notice this hit the front page until now, will be answering questions momentarily!

zellyn10 个月前

These wikis are really interesting. I'm itching to try it on the common framework parts of our work monorepo.[Update after looking at the Django wiki]The wiki's structure appears to be very (entirely?) based on the directory structure. Is that right?It would be interesting to give it the Django documentation in addition to the codebase, or possibly even just the heading structure of the documentation, and let it use that as a scaffold to generate the structure/organization of the wiki.For a specific example, the Model is one of the most important concepts in Django, but in the auto-generated wiki, it shows up underneath databases.

评论 #41009664 未加载

zby10 个月前

A prompt here is a textual presentation of a data structure - this is an area that I am exploring right now. I wonder if it would not be equally effective if the LLM was getting a JSON or Python representation instead of the presumed Markdown. But my intuition is that because LLMs are trained on texts that are meant for human consumption - then they will follow better exactly the same texts as humans find easier and that means that they 'want' exactly the same presentation as humans - nicely formatted text or nicely formatted programs, not mixes.

spirobelv210 个月前

it wants me to login to ask a questionI will just keep using phindyou have vc dollars - sponsor a public free search over open source repos.Also think about what happens when your question touches multiple repos.I tried a similar "search github repo with ai" product before, but it led me right back to phind, when it couldnt answer a question that required specific information from the repositiory as well as a google search.

评论 #41007292 未加载

评论 #41010153 未加载

magicalhippo10 个月前

So no free lunch? Making a detailed wiki of all the code would take several developer-years for us, and we're just a handful of developers.Or is the wiki generated somehow?

评论 #41006264 未加载

评论 #41007416 未加载

oshams10 个月前

BTW, if you ask for an open source repo in the comments to be processed by our system, we'll do it for free!

评论 #41009421 未加载

J_Shelby_J10 个月前

This is amazing. Do you have any information on the implementation?

评论 #41009098 未加载

16 条评论

anotherpaulg10 个月前

评论 #41002895 未加载

评论 #41005637 未加载

评论 #41007018 未加载

lmeyerov10 个月前

评论 #41007256 未加载

kordlessagain10 个月前

评论 #41007230 未加载

tlarkworthy10 个月前

评论 #41007247 未加载

maxsieg10 个月前

评论 #41014885 未加载

senko10 个月前

langcss10 个月前

评论 #41002613 未加载

评论 #41007341 未加载

maxsieg10 个月前

kleneway110 个月前

oshams10 个月前

Wow, I didn't notice this hit the front page until now, will be answering questions momentarily!

zellyn10 个月前

评论 #41009664 未加载

zby10 个月前

spirobelv210 个月前

评论 #41007292 未加载

评论 #41010153 未加载

magicalhippo10 个月前

So no free lunch? Making a detailed wiki of all the code would take several developer-years for us, and we're just a handful of developers.Or is the wiki generated somehow?

评论 #41006264 未加载

评论 #41007416 未加载

oshams10 个月前

BTW, if you ask for an open source repo in the comments to be processed by our system, we'll do it for free!

评论 #41009421 未加载

J_Shelby_J10 个月前

This is amazing. Do you have any information on the implementation?

评论 #41009098 未加载