We're excited to share a tool we've been working on called gpt-code-search. It allows you to search any codebase using natural language locally on your machine. We leverage OpenAI's GPT-4 and function calling to retrieve, search, and answer queries about your code.<p>All you need to do is to install the package with `pip install gpt-code-search`, set up your `OPENAI_API_KEY` as an environment variable, and start asking questions with `gpt-code-search query <your question>`.<p>E.g. You can ask questions like "How do I use the analytics module?" or "Document all the API routes related to authentication."<p>This is still early and hacked together in the past week, but we wanted to get it out there and get feedback.<p>We utilize OpenAI's function calling to let GPT-4 call certain predefined functions in our library. You do not need to implement any of these functions yourself. These functions are designed to interact with your codebase and return enough context for the LLM to perform code searches without pre-indexing it or uploading your repo to a third party other than OpenAI. So, you only need to run the tool from the directory you want to search.<p>The functions currently available for the LLM to call are:<p>`search_codebase` - searches the codebase using a TF-IDF vectorizer<p>`get_file_tree` - provides the file tree of the codebase<p>`get_file_contents` - provides the contents of a file<p>These functions are implemented in `gpt-code-search` and are triggered by chat completions. The LLM is prompted to utilize the search_codebase and get_file_tree function as needed to find the necessary context to answer your query and then loops as needed to collect more context with the get_file_contents until the LLM responds.<p>A couple of limitations of this approach, GPT cannot load context across multiple files in a single prompt since we are passing in the contents of a single file in each function call. So, GPT repeatedly calls the get_file_contents function to load context from multiple files. This increases the latency and cost of the tool.<p>Another thing we realized as we were building is that the level of search and retrieval is limited by the context window, which refers to the scope of the search conducted by the tool, meaning that we can only search five levels deep in the file system and can only pass in the contents of one file at a time. So it would be best to run the tool from the package/directory closest to the code you want to search.<p>We plan to add support for local vector embeddings to improve search and retrieval. Combining the vector embeddings with function calling should result in much faster and higher quality results.<p>Also, support for other models, chat interactions in the command line, and generating code is already on our backlog!<p>Please check out gpt-code-search and let me know your thoughts, feedback, or suggestions.
This looks super interesting, thanks for sharing. I like that you're exploiting the new functions API to give GPT agent-style access to explore a codebase. I have played with that previously with gpt-3.5 and plan to do some more experiments with gpt-4 someday soon.<p>I am also working on an open source CLI tool in this space [0]. I've taken a different approach, more focused on chatting with GPT to have it <i>edit</i> the code in your local git repo.<p>But my tool also provides GPT with a semantic map of your repo and the ability to ask to see particular files, etc. I use it to answer questions about unknown codebases all the time, and then start asking it to make changes. I have a chat transcript that illustrates that here [1]. As another example I needed a new feature in the glow tool and was able to make a PR [2] for it, even though I don't know anything about that codebase or even how to write golang.<p>Also, there's a small discord [3] where a few of us working on "AI coding tools" have been sharing ideas. You might be interested in joining the conversation over there.<p>[0] <a href="https://github.com/paul-gauthier/aider">https://github.com/paul-gauthier/aider</a><p>[1] <a href="https://aider.chat/examples/2048-game.html" rel="nofollow noreferrer">https://aider.chat/examples/2048-game.html</a><p>[2] <a href="https://github.com/charmbracelet/glow/pull/502">https://github.com/charmbracelet/glow/pull/502</a><p>[3] <a href="https://discord.gg/fHcgCRGu" rel="nofollow noreferrer">https://discord.gg/fHcgCRGu</a>
You're saying on the documentation:<p>> nor send your code to another third-party service.<p>But aren't you actually doing that? Sending things to OpenAI to use as context?
There's a demo at <a href="https://wolfia.com/">https://wolfia.com/</a> that lets you try out their code search on some popular repos and see other people's questions and answers.