TechEcho

Hi HN, We're excited to share repo2vec: a simple-to-use, modular library enabling you to chat with any public or private codebase. It's like Github Copilot but with the most up-to-date information about your repo.We made this because sometimes you just want to learn how a codebase works and how to integrate it, without spending hours sifting through the code itself.We tried to make it dead-simple to use. With two scripts, you can index and get a functional interface for your repo. Every generated response shows where in the code the context for the answer was pulled from.We also made it plug-and-play where every component from the embeddings, to the vector store, to the LLM is completely customizable.If you want to see a hosted version of the chat interface with its features, here's a link: <a href="https://www.youtube.com/watch?v=CNVzmqRXUCA" rel="nofollow">https://www.youtube.com/watch?v=CNVzmqRXUCA</a>We would love your feedback!- Mihail and Julia

12 comments

resters9 months ago

Very useful! I was just thinking this kind of thing should exist!I would also like to be able to have the LLM know all of the documentation for any dependencies in the same way.

评论 #41386191 未加载

评论 #41385264 未加载

cool-RR9 months ago

I want to feed it not only the code but also a corpus of questions and answers, e.g. from the discussions page on GitHub. Is that possible?

评论 #41385098 未加载

评论 #41384886 未加载

peterldowns9 months ago

Very cool project, I'm definitely going to try this out. One question — why use the OpenAI embeddings API instead of BGE (BERT) or other embeddings model that can be efficiently run client-side? Was there a quality difference or did you just default to using OpenAI embeddings?

评论 #41386149 未加载

评论 #41386132 未加载

zaptrem9 months ago

We have LLMs with hundreds of thousands of tokens context windows and prompt caching that makes using them affordable. Why don’t we just stuff the whole code base in the context window?

评论 #41386533 未加载

评论 #41386379 未加载

评论 #41386462 未加载

erichi9 months ago

Is it somehow different from Cursor codebase indexing/chat? I’m using this setup to analyse repos currently.

评论 #41388295 未加载

adamtaylor_139 months ago

Sorry for the dumb question but can I use this on private repositories or is it sending my code to OpenAI?

评论 #41385481 未加载

评论 #41385542 未加载

kevshor9 months ago

This looks super cool! Is there currently a limit to how big a repo can be for this to work efficiently?

评论 #41386600 未加载

评论 #41386395 未加载

wiradikusuma9 months ago

Is this for a specific language? Does it support polygot (multiple languages in 1 project)?

评论 #41385302 未加载

interestingsoup9 months ago

Any plans on allowing the use of a local LLM like Ollama or LM Studio?

评论 #41386167 未加载

ccgongie9 months ago

Super easy to use! Thanks! What's powering this under the hood?

评论 #41385258 未加载

RicoElectrico9 months ago

I wonder if it will work on <a href="https://github.com/organicmaps/organicmaps">https://github.com/organicmaps/organicmaps</a>So far two similar solutions I tested crapped out on non-ASCII characters. Because Python's UTF-8 decoder is quite strict about it.

评论 #41386071 未加载

评论 #41385077 未加载

ranger_danger9 months ago

is there a docker image?

12 comments

resters9 months ago

Very useful! I was just thinking this kind of thing should exist!I would also like to be able to have the LLM know all of the documentation for any dependencies in the same way.

评论 #41386191 未加载

评论 #41385264 未加载

cool-RR9 months ago

I want to feed it not only the code but also a corpus of questions and answers, e.g. from the discussions page on GitHub. Is that possible?

评论 #41385098 未加载

评论 #41384886 未加载

peterldowns9 months ago

评论 #41386149 未加载

评论 #41386132 未加载

zaptrem9 months ago

We have LLMs with hundreds of thousands of tokens context windows and prompt caching that makes using them affordable. Why don’t we just stuff the whole code base in the context window?

评论 #41386533 未加载

评论 #41386379 未加载

评论 #41386462 未加载

erichi9 months ago

Is it somehow different from Cursor codebase indexing/chat? I’m using this setup to analyse repos currently.

评论 #41388295 未加载

adamtaylor_139 months ago

Sorry for the dumb question but can I use this on private repositories or is it sending my code to OpenAI?

评论 #41385481 未加载

评论 #41385542 未加载

kevshor9 months ago

This looks super cool! Is there currently a limit to how big a repo can be for this to work efficiently?

评论 #41386600 未加载

评论 #41386395 未加载

wiradikusuma9 months ago

Is this for a specific language? Does it support polygot (multiple languages in 1 project)?

Show HN: Repo2vec – an open-source library for chatting with any codebase

12 comments

Show HN: Repo2vec – an open-source library for chatting with any codebase

12 comments