TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Repo2vec – an open-source library for chatting with any codebase

93 pointsby nutellalover9 months ago
Hi HN, We&#x27;re excited to share repo2vec: a simple-to-use, modular library enabling you to chat with any public or private codebase. It&#x27;s like Github Copilot but with the most up-to-date information about your repo.<p>We made this because sometimes you just want to learn how a codebase works and how to integrate it, without spending hours sifting through the code itself.<p>We tried to make it dead-simple to use. With two scripts, you can index and get a functional interface for your repo. Every generated response shows where in the code the context for the answer was pulled from.<p>We also made it plug-and-play where every component from the embeddings, to the vector store, to the LLM is completely customizable.<p>If you want to see a hosted version of the chat interface with its features, here&#x27;s a link: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=CNVzmqRXUCA" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=CNVzmqRXUCA</a><p>We would love your feedback!<p>- Mihail and Julia

12 comments

resters9 months ago
Very useful! I was just thinking this kind of thing should exist!<p>I would also like to be able to have the LLM know all of the documentation for any dependencies in the same way.
评论 #41386191 未加载
评论 #41385264 未加载
cool-RR9 months ago
I want to feed it not only the code but also a corpus of questions and answers, e.g. from the discussions page on GitHub. Is that possible?
评论 #41385098 未加载
评论 #41384886 未加载
peterldowns9 months ago
Very cool project, I&#x27;m definitely going to try this out. One question — why use the OpenAI embeddings API instead of BGE (BERT) or other embeddings model that can be efficiently run client-side? Was there a quality difference or did you just default to using OpenAI embeddings?
评论 #41386149 未加载
评论 #41386132 未加载
zaptrem9 months ago
We have LLMs with hundreds of thousands of tokens context windows and prompt caching that makes using them affordable. Why don’t we just stuff the whole code base in the context window?
评论 #41386533 未加载
评论 #41386379 未加载
评论 #41386462 未加载
erichi9 months ago
Is it somehow different from Cursor codebase indexing&#x2F;chat? I’m using this setup to analyse repos currently.
评论 #41388295 未加载
adamtaylor_139 months ago
Sorry for the dumb question but can I use this on private repositories or is it sending my code to OpenAI?
评论 #41385481 未加载
评论 #41385542 未加载
kevshor9 months ago
This looks super cool! Is there currently a limit to how big a repo can be for this to work efficiently?
评论 #41386600 未加载
评论 #41386395 未加载
wiradikusuma9 months ago
Is this for a specific language? Does it support polygot (multiple languages in 1 project)?
评论 #41385302 未加载
interestingsoup9 months ago
Any plans on allowing the use of a local LLM like Ollama or LM Studio?
评论 #41386167 未加载
ccgongie9 months ago
Super easy to use! Thanks! What&#x27;s powering this under the hood?
评论 #41385258 未加载
RicoElectrico9 months ago
I wonder if it will work on <a href="https:&#x2F;&#x2F;github.com&#x2F;organicmaps&#x2F;organicmaps">https:&#x2F;&#x2F;github.com&#x2F;organicmaps&#x2F;organicmaps</a><p>So far two similar solutions I tested crapped out on non-ASCII characters. Because Python&#x27;s UTF-8 decoder is quite strict about it.
评论 #41386071 未加载
评论 #41385077 未加载
ranger_danger9 months ago
is there a docker image?