TechEcho

Hey! I wanted to share a tool I've been working on. It's still very early and a work in progress, but I've found it incredibly helpful when working with Claude and OpenAI's models.What it does: I created a Python script that dumps your entire Git repository into a single file. This makes it much easier to use with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.Key Features: - Respects .gitignore patterns - Generates a tree-like directory structure - Includes file contents for all non-excluded files - Customizable file type filteringWhy I find it useful for LLM/RAG: - Full Context: It gives LLMs a complete picture of my project structure and implementation details. - RAG-Ready: The dumped content serves as a great knowledge base for retrieval-augmented generation. - Better Code Suggestions: LLMs seem to understand my project better and provide more accurate suggestions. - Debugging Aid: When I ask for help with bugs, I can provide the full context easily.How to use it: Example: python dump.py /path/to/your/repo output.txt .gitignore py js tsxAgain, it's still a work in progress, but I've found it really helpful in my workflow with AI coding assistants (Claude/Openai). I'd love to hear your thoughts, suggestions, or if anyone else finds this useful!<a href="https://github.com/artkulak/repo2file">https://github.com/artkulak/repo2file</a>P.S. If anyone wants to contribute or has ideas for improvement, I'm all ears!

16 comments

subeadia8 months ago

These are extremely common these days. Here are a few I've collected over the past few months:- [files-to-prompt](<a href="https://github.com/simonw/files-to-prompt">https://github.com/simonw/files-to-prompt</a>) (from the GOAT simonw)- [code2prompt](<a href="https://github.com/mufeedvh/code2prompt">https://github.com/mufeedvh/code2prompt</a>)- <a href="https://gh-repo-dl.cottonash.com/" rel="nofollow">https://gh-repo-dl.cottonash.com/</a>- [1filellm](<a href="https://github.com/jimmc414/1filellm">https://github.com/jimmc414/1filellm</a>)- [repopack](<a href="https://github.com/yamadashy/repopack">https://github.com/yamadashy/repopack</a>)- [ingest](<a href="https://github.com/sammcj/ingest">https://github.com/sammcj/ingest</a>)What makes yours better?

评论 #41496639 未加载

评论 #41488949 未加载

评论 #41494008 未加载

评论 #41494699 未加载

trees1018 months ago

Take a look at what aider does to create a repo map using treesitter; <a href="https://aider.chat/docs/repomap.html" rel="nofollow">https://aider.chat/docs/repomap.html</a> <a href="https://aider.chat/2023/10/22/repomap.html" rel="nofollow">https://aider.chat/2023/10/22/repomap.html</a>I guess the difference is that your script produces a complete copy, whereas aider uses a concise summary, necessary for when the context window is full

smcleod8 months ago

This is a similar tool I wrote for myself called "ingest". It ingests files/directories to LLM friendly markdown, estimates token usage, and can estimate vRAM usage for different models and quantisations and shows you a table highlighting which quantisation, context size and k/v cache quantisation will fit in a given (v)RAM size. - <a href="https://github.com/sammcj/ingest">https://github.com/sammcj/ingest</a>

some_rand_guy08 months ago

Thats cool. I've used it. I'd add:- treat '-' as stdout- named arguments- dont filter ignorefiles by checking they start with '.', cause it makes local .gitignore not being found, and treated as an extension :)

brumar8 months ago

I schemed the readme, but did not see support for prefixing each line with line numbers, this is an absolute must have for people like me who have a workflow centered around generating git patchs. In my experience that gives generated patchs much more chances to be incorrect.

llagerlof8 months ago

Nice. I have a few suggestions:Put code blocks inside 3 ticks in the beginning and 3 ticks in the end since it's the default for each file.Remove the dashes to save tokens.In the title for the code blocks put the full relative path to the file since some projects have many files with the same name.

评论 #41486898 未加载

vvoruganti8 months ago

Made a similar one that's not super polished - <a href="https://github.com/VVoruganti/repo-to-prompt">https://github.com/VVoruganti/repo-to-prompt</a>

breck8 months ago

Interesting! There was another Show HN that did this same thing earlier in the day!<a href="https://news.ycombinator.com/item?id=41480373">https://news.ycombinator.com/item?id=41480373</a>

mistermann8 months ago

Something like this that could automatically scrape a set of url's into a file would also be useful for trying to learn how to use various terrible enterprise software applications (SAP).

_andrei_8 months ago

made one as well with interactive selection and token counting <a href="https://github.com/3rd/promptpack">https://github.com/3rd/promptpack</a>

vnjxk8 months ago

There is an api for this at <a href="https://txtrepo.com" rel="nofollow">https://txtrepo.com</a> I used it with n8n to create PRs on issues

评论 #41492363 未加载

johnisgood8 months ago

How does this (or similar tools) differ from just a simple `cat foo bar > out`?

rnapoles8 months ago

Great, I didn't know about this type of tools, thanks

ndr_8 months ago

Another approach is to just tar up the files, without compression. Works well with Claude via API.

atxtechbro8 months ago

Seems like a common itch to scratch and a good tool to scratch it with. I created 'linusfiles' and 'grabout' as tools with this. Grabout copies the last input and error message or other output to clipboard and linusfiles copies the tracked files to clipboard.But I like the idea of tarballing it, as ndr_ suggested. I'm thinking that could be the move here.In case anyone wanted to see my workflows <a href="https://github.com/atxtechbro/shell-tooling">https://github.com/atxtechbro/shell-tooling</a>

AyushK18 months ago

that's a cool project.

16 comments

subeadia8 months ago

评论 #41496639 未加载

评论 #41488949 未加载

评论 #41494008 未加载

评论 #41494699 未加载

trees1018 months ago

smcleod8 months ago

some_rand_guy08 months ago

brumar8 months ago

llagerlof8 months ago

评论 #41486898 未加载

vvoruganti8 months ago

Made a similar one that's not super polished - <a href="https://github.com/VVoruganti/repo-to-prompt">https://github.com/VVoruganti/repo-to-prompt</a>

breck8 months ago

Interesting! There was another Show HN that did this same thing earlier in the day!<a href="https://news.ycombinator.com/item?id=41480373">https://news.ycombinator.com/item?id=41480373</a>

mistermann8 months ago

Something like this that could automatically scrape a set of url's into a file would also be useful for trying to learn how to use various terrible enterprise software applications (SAP).

_andrei_8 months ago

made one as well with interactive selection and token counting <a href="https://github.com/3rd/promptpack">https://github.com/3rd/promptpack</a>

vnjxk8 months ago

There is an api for this at <a href="https://txtrepo.com" rel="nofollow">https://txtrepo.com</a> I used it with n8n to create PRs on issues

评论 #41492363 未加载

johnisgood8 months ago

How does this (or similar tools) differ from just a simple `cat foo bar > out`?

rnapoles8 months ago

Great, I didn't know about this type of tools, thanks

ndr_8 months ago

Another approach is to just tar up the files, without compression. Works well with Claude via API.

atxtechbro8 months ago

AyushK18 months ago

that's a cool project.

Show HN: Dump entire Git repos into a single file for LLM prompts

16 comments

Show HN: Dump entire Git repos into a single file for LLM prompts

16 comments