TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Launch HN: Greptile (YC W24) - RAG on codebases that actually works

253 点作者 dakshgupta大约 1 年前
Hi HN, we&#x27;re the co-founders of Greptile, a tool that can accurately answer questions about complex codebases. Developers use us to spend less time wrestling with codebases and more time actually writing code. Here&#x27;s a demo: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;qI24eKO1YX0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;qI24eKO1YX0</a>. You can try it on 100 popular repos here: <a href="https:&#x2F;&#x2F;app.greptile.com&#x2F;repo">https:&#x2F;&#x2F;app.greptile.com&#x2F;repo</a>, and on your own repo (if you give permission - more on that below) here: <a href="https:&#x2F;&#x2F;app.greptile.com">https:&#x2F;&#x2F;app.greptile.com</a>.<p>We are far from the first people to try &quot;RAG on your codebase&quot;. We focus on full codebase comprehension: using LLMs to accurately answer difficult questions with full context of large, complex, and even multi-repo codebases.<p>Simple RAG alone is not sufficient for this task. Codebases aren’t like most PDFs, docs, or other similar data types. They are graphs—complex puzzles where each piece is interlinked. So Greptile does a few things past simple RAG:<p>(1) Instead of directly embedding code, we parse the AST of the codebase, recursively generate docstrings for each node in the tree, and then embed the docstrings.<p>(2) Alongside vector similarity search and keyword search, we do “agentic search” where an agent reviews the relevance of the search results, and scans the source code to follow references that might lead to something important. Then it returns the relevant sources.<p>For example, here are a couple questions that this system is able to answer in our test repo that simple RAG couldn’t (in our experience):<p>“<i>Where are the auth providers configured?</i>” (They are in an array inside of an options.ts file, where looking at the file it’s not obvious it’s an auth related file. However, because that array is imported into the auth&#x2F;route.ts file, Greptile’s agent traces and find it)<p>“<i>How would I add a postgres connector?</i>” (The best way to answer this is to see how the Redis connector is set up and mirror it. Simple RAG sometimes retrieves some of the code for the Redis connector, but Greptile’s agent follows the connections to retrieve all the code that the redis connector touches, and uses that to write instructions.)<p>Developers (including at Stripe and Microsoft) are using Greptile for things like:<p>Debugging—you can paste in an error message and it does a pretty good job of diagnosing the root cause and suggesting fixes.<p>Grokking OSS repos—for example, if you&#x27;re forking a repo, modifying it for your usecase, or just integrating it, Greptile lets you add multiple repos and dependencies in the same chat session so it has full context.<p>Parsing legacy code at work—especially if original engineers have left the company.<p>Since we&#x27;re accessing your private code, we&#x27;re very careful with security. We don&#x27;t store any code on our servers after initial processing, and just pull snippets as needed from the GitHub API.<p>Quick note: when you sign in with GH, it might ask for permission to &quot;act on your behalf&quot;. This is a quirk of GitHub&#x27;s wording—our permissions are read-only and the only thing we do &quot;on your behalf&quot; is read code, so we can index the repo.<p>We came up with this idea while working at AWS—the codebase was super complicated, the docs were sparse and out of date, and our team was remote so it was slow to get answers to questions. We picked &quot;greptile&quot; because of &quot;grep&quot; and also we just wanted a somewhat silly name.<p>Try it out! It&#x27;s a work in progress, so any feedback is appreciated. Here are the links again: for popular open source repos see <a href="https:&#x2F;&#x2F;app.greptile.com&#x2F;repo">https:&#x2F;&#x2F;app.greptile.com&#x2F;repo</a>, and to get it working on your own repo, start at <a href="https:&#x2F;&#x2F;app.greptile.com">https:&#x2F;&#x2F;app.greptile.com</a>.<p>If you have experience working with a complex codebase at work or for a project, I’d love to hear about it. It really helps us educate our product direction. Looking forward to comments!<p>edit. For those who want to try this on large or private repos, here is a promo code for a free month: HACKERNEWS100

45 条评论

moritonal大约 1 年前
Works well. Today I was working with how Rail&#x27;s works with BigDecimals, so (knowing the answer) I asked:<p>&quot;When using &quot;as_json&quot; in a controller to return the JSON of a model, how are BigDecimal&#x27;s encoded?&quot;<p>Answer: &quot;When using as_json in a controller to return the JSON of a model, BigDecimal values are encoded as strings. This behavior is defined in the active_support&#x2F;core_ext&#x2F;object&#x2F;json.rb file, specifically in the BigDecimal class extension for JSON encoding. The rationale behind this approach is that most...&quot; which is exactly the case as I learnt through various PR&#x27;s, Issues and code review.<p>This would have saved me about 30mins of work. I wonder if it takes into account the metadata, such as authors, related comments, issues and PRs?
评论 #39605798 未加载
评论 #39605856 未加载
dvt大约 1 年前
Ran it on a &quot;real&quot; OSS project of mine (<a href="https:&#x2F;&#x2F;github.com&#x2F;dvx&#x2F;lofi&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;dvx&#x2F;lofi&#x2F;</a>), and it was stuck at 99% loading for about 30 minutes. Then, when it finally parsed the codebase, when asked anything it always returns &quot;Error: Internal error while locating sources.&quot; Specifically, I wanted to see if it can context switch between TypeScript (used for the front-end), ObjectiveC (used for a few Mac features), C++ (used for Windows volume features), and GLSL (used for visualizations). But alas.<p>At one point, this random prompt popped up: <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;mYeluaU" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;mYeluaU</a> —what&#x27;s &quot;Onboard?&quot; Is this some kind of weird LLM leakage&#x2F;hallucination?<p>With all respect, this is like a pre-MVP quality product. The codebase isn&#x27;t even particularly large and the experience is extremely sub-par. Charging for something like this is honestly highway robbery.
评论 #39608421 未加载
评论 #39608526 未加载
评论 #39612776 未加载
jasonkester大约 1 年前
You’re going to want to define the acronym RAG before you use it a dozen times in your marketing copy.<p>Presumably it’s great news that I can RAG on my codebase. But I’m not sure whether I’ve ever ragged anything in my career or whether I’ll want to now.<p>If you told us what it meant, we could probably understand what your thing does.
评论 #39612905 未加载
alalani1大约 1 年前
I like clever project names :)<p>This looks great - I just tried to generate sample code in the react repo and was pleasantly surprised. Do you have a sense of whether this works well to generate code in general, i.e. generate an API route to return X data that works similar to the other API routes.
评论 #39606070 未加载
评论 #39606077 未加载
simonw大约 1 年前
&quot;We don&#x27;t store any code on our servers after initial processing&quot;<p>Are you storing the embedding vectors you&#x27;ve calculated from the code? If so, those are likely quite easily reversible - so I would still consider that source code stored on your servers from the point of view of a security audit.<p>As a result, I might actually prefer to have copies of my code stored on your servers if it resulted in faster performance.
评论 #39608269 未加载
评论 #39608150 未加载
评论 #39608789 未加载
koeng大约 1 年前
I&#x27;d love to try it, but pretty much all my repos are &gt;10mb. It&#x27;s not because there is that much code, but because I am doing bioinformatics and the test files (for the unit tests) inflate the repo size. It would be great if there was a way to test it on just 1 large repo for perhaps a week or something, because I balk at the idea of spending $20 a month on something that I don&#x27;t even know works well.<p>This is important because I&#x27;m not deeply familiar with public projects, so I can&#x27;t accurately assess if the tool is worthwhile. Whereas with one of my repos, I&#x27;d be able to tell quality pretty quickly.
评论 #39606611 未加载
评论 #39610650 未加载
评论 #39608047 未加载
评论 #39606624 未加载
评论 #39608290 未加载
Tsarp大约 1 年前
How does it compare with something like Bloop, which also uses a combination of a syntax tree, Embeddings, FTS and LLMs?
评论 #39606376 未加载
gdcbe大约 1 年前
“Where we going we don’t need docs”. That scares me… docs are among other things there to provide context and info for things not clear from why certain choices were made or not made… no way your AI is going to guess that I put that restriction because of an explicit request from product, despite it looking wrong…
评论 #39606787 未加载
评论 #39606770 未加载
评论 #39617311 未加载
评论 #39607026 未加载
评论 #39615492 未加载
Conscat大约 1 年前
I&#x27;ve tried it on my own C++ codebase. It&#x27;s fun, and I&#x27;m impressed that it could tell me which C++ standard is used (a question which is often difficult to find an answer to on random codebases), but it&#x27;s really bad at analyzing templates. The answers it gives me are always incomplete and usually at least partly or mostly incorrect. I&#x27;m surprised by this in some cases, because my questions are answered by comments in the source code.<p><a href="https:&#x2F;&#x2F;app.greptile.com&#x2F;share&#x2F;4953cbff-13ec-4427-b0af-02889a52096b">https:&#x2F;&#x2F;app.greptile.com&#x2F;share&#x2F;4953cbff-13ec-4427-b0af-02889...</a>
评论 #39606109 未加载
luke-stanley大约 1 年前
It&#x27;s cool to see tools like this. I ran into some issues though:<p>1. &quot;We will email you&quot;... &quot;once the repositories have finished processing&quot; Not sure you&#x27;re supposed to do that without consent, when the intent was just to connect GitHub! Email use is supposed to be opt-in.<p>2. My tiny repo (<a href="https:&#x2F;&#x2F;github.com&#x2F;lukestanley&#x2F;ChillTranslator">https:&#x2F;&#x2F;github.com&#x2F;lukestanley&#x2F;ChillTranslator</a>) won&#x27;t load.<p>3. The UI for selecting a GitHub repo is hard to find and fiddly to use.<p>4. I couldn&#x27;t see where to put the promo code.
评论 #39609333 未加载
IceDane大约 1 年前
Not a single repo I&#x27;ve tried works. A lot of them seem not to have finished processing, but even the ones that have finished don&#x27;t work.
评论 #39607354 未加载
ram417大约 1 年前
Love this idea and am just signed up. Thanks for the promo code! Also, I really like your blog post about shipping faster: <a href="https:&#x2F;&#x2F;greptile.com&#x2F;blog&#x2F;ship-faster">https:&#x2F;&#x2F;greptile.com&#x2F;blog&#x2F;ship-faster</a>. Shipping code is so fun that we should all be looking for ways to do more of it.
评论 #39608799 未加载
nomoreipg大约 1 年前
How&#x27;s this different from Adrenaline or Cursor or Bloop
评论 #39609360 未加载
cdtwigg大约 1 年前
Apparently I’m the only one here who doesn’t know this but: What is RAG?
评论 #39614575 未加载
评论 #39611941 未加载
评论 #39612287 未加载
pivic大约 1 年前
I only get &#x27;Error: Internal error while processing request.&#x27; when I try to run queries. I tested three different repos, same error message appeared for each repo.
评论 #39618415 未加载
nico大约 1 年前
Cool, will check it out<p>Does it integrate with Visual Studio, does it provide code suggestions?<p>Been doing a lot of back and forth iteration with ChatGPT to build a python project from scratch<p>It’s been a really good experience although frustratingly slow at times (from going back and forth between the browser and code and having to wait for gpt’s answers)<p>Can more documentation be automatically added? For example, it might be useful in a rails project to be able to get answers about the ruby and rails documentations
评论 #39605892 未加载
fuzzythinker大约 1 年前
After giving permission, it asked to:<p>&quot;Link Your Code Hosting Providers Connect your accounts for seamless integration, and to access private repositories.&quot;<p>What does this mean?
评论 #39607463 未加载
DavidFerris大约 1 年前
Super cool! btw I love the name &quot;Greptile&quot; :)
评论 #39611291 未加载
theckel大约 1 年前
I just keep getting: &quot;Error: Internal error while locating sources.&quot; when trying to talk to a repo that is green and &quot;up to date&quot;
评论 #39609365 未加载
drcongo大约 1 年前
I&#x27;ve been looking for something like this, but local-only. Any plans to let people self-host and point at local repositories?
评论 #39606258 未加载
评论 #39608400 未加载
评论 #39609068 未加载
alchemist1e9大约 1 年前
Does it use tree-sitter for all the AST parsing?
评论 #39606040 未加载
anton-107大约 1 年前
Getting &quot;Error: Internal error while processing request.&quot; while trying on my personal public github repo. HN effect?
评论 #39607139 未加载
jbellis大约 1 年前
Looks like some kind of bug on repos w&#x2F; many branches. Loading <a href="https:&#x2F;&#x2F;github.com&#x2F;datastax&#x2F;cassandra&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;datastax&#x2F;cassandra&#x2F;</a>, I search for `vsearch` and it presents me with CNDB-8708-vsearch and DSP-23946-vsearch, but not vsearch itself.
评论 #39609051 未加载
peter_d_sherman大约 1 年前
Related: <a href="https:&#x2F;&#x2F;greptile.com&#x2F;pricing">https:&#x2F;&#x2F;greptile.com&#x2F;pricing</a>
sidcool大约 1 年前
Congrats on launching. However I don&#x27;t like the &#x27;Act on your behalf&#x27; permission this needs.
评论 #39605842 未加载
mcfig大约 1 年前
Asking questions of any repo on “repo” fails with “Error: Internal error while processing request.” This is pribably because I unlinked my Github connection after trying it out, but it shouldn’t be trying to use that in this case.
评论 #39607331 未加载
intalentive大约 1 年前
The AST approach should be integrated into code generation. Instead of generating text, generate AST nodes. Something like “Copilot with Intellisense” could be a game changer.
评论 #39612919 未加载
tom_大约 1 年前
What does RAG stand for?
评论 #39607325 未加载
评论 #39607455 未加载
评论 #39608162 未加载
评论 #39607312 未加载
stuaxo大约 1 年前
Nice reading the steps you take to analyse the code.<p>I had scrolled past this article without clicking and had the same thoughts about how I&#x27;d approach this.
ankit84大约 1 年前
Can it answer customer support questions on API&#x27;s cryptic error messages? E.g. Give hints on changes needed in the request payload.
评论 #39607475 未加载
obiefernandez大约 1 年前
Will it work with a large Ruby on Rails codebase?
评论 #39609343 未加载
doctorpangloss大约 1 年前
I tried asking a question about Porter and I see the error:<p>&gt; Oops<p>&gt; We couldn&#x27;t access this repo.<p>&gt; You may need to log in to view this repository, or it might not exist.
评论 #39606326 未加载
评论 #39606304 未加载
setgree大约 1 年前
very nice! FYI the &#x27;free coffee&#x27; link (<a href="https:&#x2F;&#x2F;calendly.com&#x2F;dakshgupta&#x2F;free-coffee" rel="nofollow">https:&#x2F;&#x2F;calendly.com&#x2F;dakshgupta&#x2F;free-coffee</a>) identifies you as &quot; Daksh, co-founder&#x2F;CEO at Onboard.&quot;<p>Also I am getting the &#x27;Error: Internal error while processing request&#x27;
评论 #39618423 未加载
sourabh03agr大约 1 年前
Congrats on the launch! Do you need Github permissions to answer questions on open-source repos as well?
评论 #39606042 未加载
评论 #39606054 未加载
iknownthing大约 1 年前
Looks good, but there are many competitors that do exactly the same thing (even opensource ones)
评论 #39609048 未加载
评论 #39606399 未加载
评论 #39606132 未加载
评论 #39606664 未加载
minhoryang大约 1 年前
I like your 100 repo selections :)
评论 #39610834 未加载
sanity大约 1 年前
I linked to my github but can&#x27;t find where to use the promo code :-&#x2F;
评论 #39607398 未加载
评论 #39609030 未加载
nahimn大约 1 年前
10&#x2F;10 on the name
hazelnutcloud大约 1 年前
&gt; This repo failed to process<p>nice
评论 #39607506 未加载
shw1n大约 1 年前
This is super cool, my co-founder and I were brainstorming how to essentially expand the context window via first-order concepts for this exact purpose last night<p>Excited to try it out
评论 #39605860 未加载
pgt大约 1 年前
Can Greptile read Clojure codebases?
评论 #39669099 未加载
cbsks大约 1 年前
Can you add the Linux kernel?
stevemadere大约 1 年前
Can you guys add huggingface transformers as one of the public demo repos? I have some very specific use cases where I&#x27;ve seen ChatGPT with GPT4 totally fall on its face Dunning-Kruger style.<p>I&#x27;d like to see if your tech solves those issues.
hellospike大约 1 年前
constantly got Error: Internal error while processing request.
评论 #39614780 未加载
geospatialover大约 1 年前
awesome! congrats on the launch