TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Repogather – copy relevant files to clipboard for LLM coding workflows

65 点作者 grbsh8 个月前
Hey HN, I wanted to share a simple command line tool I made that has sped up and simplified my LLM assisted coding workflow. Whenever possible, I’ve been trying to use Claude as a first pass when implementing new features &#x2F; changes. But I found that depending on the type of change I was making, I was spending a lot of thought finding and deciding which source files should be included in the prompt. The need to copy&#x2F;paste each file individually also becomes a mild annoyance.<p>First, I implemented `repogather --all` , which unintelligently copies <i>all</i> sources files in your repository to the clipboard (delimited by their relative filepaths). To my surprise, for less complex repositories, this alone is often completely workable for Claude — much better than pasting in the just the few files you are looking to update. But I never would have done it if I had to copy&#x2F;paste everything individually. 200k is quite a lot of tokens!<p>But as soon as the repository grows to a certain complexity level (even if it is under the input token limit), I’ve found that Claude can get confused by different unrelated parts &#x2F; concepts across the code. It performs much better if you make an attempt to exclude logic that is irrelevant to your current change. So I implemented `repogather &quot;&lt;query here&gt;&quot;` , e.g. `repogather &quot;only files related to authentication&quot;` . This uses gpt-4o-mini with structured outputs to provide a relevance score for each source file (with automatic exclusions for .gitignore patterns, tests, configuration, and other manual exclusions with `--exclude &lt;pattern&gt;` ).<p>gpt-4o-mini is so cheap and fast, that for my ~8 dev startup’s repo, it takes under 5 seconds and costs 3-4 cents (with appropriate exclusions). Plus, you get to watch the output stream while you wait which always feels fun.<p>The retrieval isn’t always perfect the first time — but it is fast, which allows you to see what files it returned, and iterate quickly on your command. I’ve found this to be much more satisfying than embedding-search based solutions I’ve used, which seem to fail in pretty opaque ways.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;gr-b&#x2F;repogather">https:&#x2F;&#x2F;github.com&#x2F;gr-b&#x2F;repogather</a><p>Let me know if it is useful to you! Always love to talk about how to better integrate LLMs into coding workflows.

9 条评论

faangguyindia8 个月前
I usually only edit 1 function using LLM on old code base.<p>On Greenfield projects. I ask Claude Soñnet to write all the function and their signature with return value etc..<p>Then I&#x27;ve a script which sends these signature to Google Flash which writes all the functions for me.<p>All this happens in paraellel.<p>I&#x27;ve found if you limit the scope, Google Flash writes the best code and it&#x27;s ultra fast and cheap.
评论 #41522309 未加载
评论 #41527075 未加载
mrtesthah8 个月前
This symbolic link broke it:<p>srtp -&gt; .<p><pre><code> File &quot;repogather&#x2F;file_filter.py&quot;, line 170, in process_directory if item.is_file(): ^^^^^^^^^^^^^^</code></pre> OSError: [Errno 62] Too many levels of symbolic links: &#x27;submodules&#x2F;externals&#x2F;srtp&#x2F;include&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x2F;srtp&#x27;
评论 #41530469 未加载
reacharavindh8 个月前
Do you literally paste a wall of text (source code of the filtered whole repo) into the prompt and ask the LLM to give you a diff patch as an answer to your question?<p>Example,<p>Here is my whole project, now implement user authentication with plain username&#x2F;password?
评论 #41522720 未加载
评论 #41522148 未加载
评论 #41521979 未加载
reidbarber8 个月前
Nice! I built something similar, but in the browser with drag-and-drop at <a href="https:&#x2F;&#x2F;files2prompt.com" rel="nofollow">https:&#x2F;&#x2F;files2prompt.com</a><p>It doesn’t have all the fancy LLM integration though.
fellowniusmonk8 个月前
This looks very cool for complex queries!<p>If your codebase is structured in a very modular way than this one liner mostly just works:<p>find . -type f -exec echo {} \; -exec cat {} \; | pbcopy
评论 #41530525 未加载
smcleod8 个月前
There&#x27;s so many of these popping up! Here&#x27;s mine - <a href="https:&#x2F;&#x2F;github.com&#x2F;sammcj&#x2F;ingest">https:&#x2F;&#x2F;github.com&#x2F;sammcj&#x2F;ingest</a>
jondwillis8 个月前
In this thread: nobody using Cursor, embedding documentation, using various RAG techniques…
评论 #41530494 未加载
ukuina8 个月前
It&#x27;s fascinating to see how different frameworks are dealing with the problem of populating context correctly. Aider, for example, asks users to manually add files to context. Claude Dev attempts to grep files based on LLM intent. And Continue.dev uses vector embeddings to find relevant chunks and files.<p>I wonder if an increase in usable (not advertised) context tokens may obviate many of these approaches.
评论 #41522625 未加载
评论 #41523884 未加载
评论 #41522117 未加载
评论 #41522251 未加载
评论 #41522076 未加载
评论 #41521978 未加载
评论 #41524212 未加载
faangguyindia8 个月前
LLM for coding is bit meh after novelty wears off.<p>I&#x27;ve had problems where LLM doesn&#x27;t know which library version I am using. It keeps suggesting methods which do not exit etc...<p>As if LLM are unaware of library version.<p>Place where I found LLM to be most effect and effortless is CLI<p>My brother made this but I use it everyday <a href="https:&#x2F;&#x2F;github.com&#x2F;zerocorebeta&#x2F;Option-K">https:&#x2F;&#x2F;github.com&#x2F;zerocorebeta&#x2F;Option-K</a>
评论 #41522759 未加载
评论 #41523105 未加载
评论 #41524216 未加载