Show HN: I built an AI that turns GitHub codebases into easy tutorials

923 pointsby zh2408about 1 month ago

<a href="https://the-pocket.github.io/Tutorial-Codebase-Knowledge/" rel="nofollow">https://the-pocket.github.io/Tutorial-Codebase-Knowledge/</a>

60 comments

bilalqabout 1 month ago

This is actually really cool. I just tried it out using an AI studio API key and was pretty impressed. One issue I noticed was that the output was a little too much "for dummies". Spending paragraphs to explain what an API is through restaurant analogies is a little unnecessary. And then followed up with more paragraphs on what GraphQL is. Every chapter seems to suffer from this. The generated documentation seems more suited for a slightly technical PM moreso than a software engineer. This can probably be mitigated by refining the prompt.The prompt would also maybe be better if it encouraged variety in diagrams. For somethings, a flow chart would fit better than a sequence diagram (e.g., a durable state machine workflow written using AWS Step Functions).

评论 #43743287 未加载

评论 #43746793 未加载

评论 #43764213 未加载

评论 #43741600 未加载

swashbuck1rabout 1 month ago

While the doc generator is a useful example app, the really interesting part is how you used Cursor to start a PocketFlow design doc for you, then you fine-tuned the details of the design doc to describe the PocketFlow execution graph and utilities you wanted the design of the doc-generator to follow…and then you used used Cursor to generate all the code for the doc-generator application.This really shows off that the simple node graph, shared storage and utilities patterns you have defined in your PocketFlow framework are useful for helping the AI translate your documented design into (mostly) working code.Impressive project!See design doc <a href="https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/blob/main/docs/design.md">https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/bl...</a>And video <a href="https://m.youtube.com/watch?v=AFY67zOpbSo" rel="nofollow">https://m.youtube.com/watch?v=AFY67zOpbSo</a>

mooredsabout 1 month ago

I had not used gemini before, so spent a fair bit of time yak shaving to get access to the right APIs and set up my Google project. (I have an OpenAPI key but it wasn't clear how to use that service.)I changed it to use this line:<pre><code> api_key=os.getenv("GEMINI_API_KEY", "your-api_key") </code></pre> instead of the default project/location option.and I changed it to use a different model:<pre><code> model = os.getenv("GEMINI_MODEL", "gemini-2.5-pro-preview-03-25") </code></pre> I used the preview model because I got rate limited and the error message suggested it.I used this on a few projects from my employer:- <a href="https://github.com/prime-framework/prime-mvc">https://github.com/prime-framework/prime-mvc</a> a largish open source MVC java framework my company uses. I'm not overly familiar with this, though I've read a lot of code written in this framework.- <a href="https://github.com/FusionAuth/fusionauth-quickstart-ruby-on-rails-web/">https://github.com/FusionAuth/fusionauth-quickstart-ruby-on-...</a> a smaller example application I reviewed and am quite familiar with.- <a href="https://github.com/fusionauth/fusionauth-jwt">https://github.com/fusionauth/fusionauth-jwt</a> a JWT java library that I've used but not contributed to.Overall thoughts:Lots of exclamation points.Thorough overview, including of some things that were not application specific (rails routing).Great analogies. Seems to lean on them pretty heavily.Didn't see any inaccuracies in the tutorials I reviewed.Pretty amazing overall!

评论 #43744625 未加载

manofmanysmilesabout 1 month ago

I love it! I effectively achieve similar results by asking Cursor lots of questions!Like at least one other person in the comments mentioned, I would like a slightly different tone.Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.I may submit a PR though not if it takes a lot of time.

评论 #43740621 未加载

TheTaytayabout 1 month ago

Woah, this is really neat. My first step for many new libraries is to clone the repo, launch Claude code, and ask it to write good documentation for me. This would save a lot of steps for me!

评论 #43748435 未加载

fforfloabout 1 month ago

If you want to use Ollama to run local models, here’s a simple example:from ollama import chat, ChatResponsedef call_llm(prompt, use_cache: bool = True, model="phi4") -> str: response: ChatResponse = chat( model=model, messages=[{ 'role': 'user', 'content': prompt, }] ) return response.message.content

评论 #43744660 未加载

Tooabout 1 month ago

How well does this work on unknown code bases?The tutorial on requests looks uncanny for being generated with no prior context. The use cases and examples it gives are too specific. It is making up terminology, for concepts that are not mentioned once in the repository, like "functional api" and "hooks checkpoints". There must be thousands of tutorials on requests online that every AI was already trained on. How do we know that it is not using them?

chairhairairabout 1 month ago

A company (mutable ai) was acquired by Google last year for essentially doing this but outputting a wiki instead of a tutorial.

评论 #43750652 未加载

评论 #43740451 未加载

gregpr07about 1 month ago

I built browser use. Dayum, the results for our lib are really impressive, you didn’t touch outputs at all? One problem we have is maintaining the docs with current codebase (code examples break sometimes). Wonder if I could use parts of Pocket to help with that.

评论 #43741915 未加载

评论 #43743409 未加载

esjeonabout 1 month ago

At the top are some neat high-level stuffs, but, below that, it quickly turns into code-written-in-human-language.I think it should be possible to extract some more useful usage patterns by poking into related unit tests. How to use should be what matters to most tutorial readers.

remoqueteabout 1 month ago

This is nice and fun for getting some fast indications on an unknown codebase, but, as others said here and elsewhere, it doesn't replace human-made documentation.<a href="https://passo.uno/whats-wrong-ai-generated-docs/" rel="nofollow">https://passo.uno/whats-wrong-ai-generated-docs/</a>

评论 #43750928 未加载

axelr34022 days ago

We are also building a tool to understand codebases. Our tool shows the features implemented in a codebase visually, along with their hierarchy, and with traceability to associated code.Here is an example feature map for the Spot robot SDK from Boston Dynamics with 100k lines of code: <a href="https://product-map.ai/app/public?url=https://github.com/boston-dynamics/spot-sdk" rel="nofollow">https://product-map.ai/app/public?url=https://github.com/bos...</a>

mattfrommarsabout 1 month ago

WTFYou built in in one afternoon? I need to figure out these mythical abilities.I've thought about this idea few weeks back but could not figure out how to implement it.Amazing job OP

fforfloabout 1 month ago

With $GEMINI_MODE=gemini-2.0-flash I also got some decent results for libraries like simonw/llm and pgcli.You can tell that because simonw writes quite heavily-documented code an the logic is pretty straightforward, it helps the model a lot!<a href="https://github.com/Florents-Tselai/Tutorial-Codebase-Knowledge/tree/more-examples/docs/llm">https://github.com/Florents-Tselai/Tutorial-Codebase-Knowled...</a><a href="https://github.com/Florents-Tselai/Tutorial-Codebase-Knowledge/tree/more-examples/docs/pgcli">https://github.com/Florents-Tselai/Tutorial-Codebase-Knowled...</a>

评论 #43749840 未加载

ameliusabout 1 month ago

I've said this a few times on HN: why don't we use LLMs to generate documentation? But then came the naysayers ...

评论 #43744780 未加载

评论 #43751203 未加载

评论 #43744382 未加载

mvATM99about 1 month ago

This is really cool and very practical. definitely will try it out for some projects soon.Can see some finetuning after generation being required, but assuming you know your own codebase that's not an issue anyway.

citizenpaulabout 1 month ago

This is really cool. One of the best AI things I've seen in the last two years.

wg0about 1 month ago

That's a game changer for a new Open source contributor's onboarding.Put in postgres or redis codebase, get a good understanding and get going to contribute.

评论 #43742364 未加载

Retr0idabout 1 month ago

The overview diagrams it creates are pretty interesting, but the tone/style of the AI-generated text is insufferable to me - e.g. <a href="https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Requests/01_functional_api.html#whats-the-functional-api" rel="nofollow">https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...</a>

评论 #43740382 未加载

评论 #43743905 未加载

评论 #43740406 未加载

评论 #43741323 未加载

kaycebasquesabout 1 month ago

Very cool, thanks for sharing. I imagine that this will make a lot of my fellow technical writers (even more) nervous about the future of our industry. I think the reality is more along the lines of:* Previously, it was simply infeasible for most codebases to get a decent tutorial for one reason or another. E.g. the codebase is someone's side project and they don't have the time or energy to maintain docs, let alone a tutorial, which is widely regarded as one of the most labor-intensive types of docs.* It's always been hard to persuade businesses to hire more technical writers because it's perenially hard to connect our work to the bottom or top line.* We may actually see more demand for technical writers because it's now more feasible (and expected) for software projects of all types to have decent docs. The key future skill would be knowing how to orchestrate ML tools to produce (and update) docs.(But I'm also under no delusion: it definitely possible for TWs to go the way of the dodo bird and animatronics professionals.)I think I have a very good way to evaluate this "turn GitHub codebases into easy tutorials" tool but it'll take me a few days to write up. I'll post my first impressions to <a href="https://technicalwriting.dev" rel="nofollow">https://technicalwriting.dev</a>P.S. there has been a flurry of recent YC startups focused on automating docs. I think it's a tough space. The market is very fragmented. Because docs are such a widespread and common need I imagine that a lot of the best practices will get commoditized and open sourced (exactly like Pocket Flow is doing here)

评论 #43759012 未加载

potamicabout 1 month ago

Did you measure how much it cost to run it against your examples? Trying to gauge how much it would cost to run this against my repos.

评论 #43743165 未加载

stephantulabout 1 month ago

The dspy tutorial is amazing. I think dspy is super difficult to understand conceptually, but the tutorial explained it really well

theptipabout 1 month ago

Yes! AI for docs is one of the usecases I’m bullish on. There is a nice feedback loop where these docs will help LLMs to understand your code too. You can write a GH action to check if your code change / release changes the docs, so they stay fresh. And run your tutorials to ensure that they remain correct.

评论 #43744742 未加载

badmonsterabout 1 month ago

do you have plans to expand this to include more advanced topics like architecture-level reasoning, refactoring patterns, or onboarding workflows for large-scale repositories?

评论 #43739684 未加载

iamsaitam27 days ago

BLOATED. This project is 100 lines of code, but everything that is non-code related is bloated like a gas giant. All the text and videos are written by an LLM. The author would learn from understanding that QUANTITY isn't QUALITY, toning down the verbiage would benefit greatly what they are trying to communicate.PS: The generated "design documents" are 2k+ lines long. This seems like a great way to exceed quotas.

1899-12-30about 1 month ago

As an extension to this general idea: AI generated interactive tutorials for software usage might be a good product. Assuming it was trained on the defined usage paths present in the code, it would be able to guide the user through those usages.

ganesshabout 1 month ago

Does it use the docs in the repository or only the code?

评论 #43743456 未加载

nitinram30 days ago

This is super cool! I attempted to use this on a project and kept running into "This model's maximum context length is 200000 tokens. However, your messages resulted in 459974 tokens. Please reduce the length of the messages." I used open ai o4-mini. Is there an easy way to handle this gracefully? Basically if you had thoughts on how to make some tutorials for really large codebases or project directories?

评论 #43767986 未加载

lummmabout 1 month ago

I actually have created something very similar here: <a href="https://github.com/Black-Tusk-Data/crushmycode">https://github.com/Black-Tusk-Data/crushmycode</a>, although with a greater focus on 'pulling apart' the codebase for onboarding. So many potential applications of the resultant knowledge graph.

bionhowardabout 1 month ago

“I built an AI”Looks insideREST API calls

pknerdabout 1 month ago

Interesting..would you like to share some technical details? it did not seem you have used RAG here?

评论 #43743482 未加载

chbkallabout 1 month ago

Love this. These are the kind of AI applications we need which aid our learning and discovery.

zarkenfroodabout 1 month ago

Really nice work and thank you for sharing. These are great demonstrations of the value of LLMs which help to go against the negative view on the impacts to junior engineers. This helps bridge the gap of most projects lacking updated documentation.

android521about 1 month ago

For anyone doubting AI as pure hype, this is the counter example of its usefulness

评论 #43741563 未加载

lastdongabout 1 month ago

Great stuff, I may try it with a local model. I think the core logic for the final output is all in the nodes.py file, so I guess one can try and tweak the prompts, or create a template system.

touristtamabout 1 month ago

Just need to find one way to integrate into the deployment pipeline and output some markdown (or other format) to send them to what ever your company is using (or simply a live website), I'd say.

thomabout 1 month ago

This is definitely a cromulent idea, although I’ve realised lately that ChatGPT with search turned on is a great balance of tailoring to my exact use case and avoiding hallucinations.

trash_catabout 1 month ago

This is literally what I use AI for. Excellent project.

orsenthilabout 1 month ago

It will be good to integrate a local web server to fire up and read the doc. I use vscode, markdown preview. And it works too. Cool project.

polishdude20about 1 month ago

Is there an easy way to have this visit a private repository? I've got a new codebase to learn and it's behind credentials.

评论 #43767695 未加载

评论 #43744978 未加载

andrewrnabout 1 month ago

This is brilliant. I would make great use of this.

gbraadabout 1 month ago

Interesting, but gawd awful analogy: "like a takeout order app". It tries to be amicable, which feels uncanny.

throwaway314155about 1 month ago

I suppose I'm just a little bit bothered by your saying you "built an AI" when all the heavy lifting is done by a pretrained LLM. Saying you made an AI-based program or hell, even saying you made an AI agent, would be more genuine than saying you "built an AI" which is such an all-encompassing thing that I don't even know what it means. At the very least it should imply use of some sort of training via gradient descent though.

评论 #43741185 未加载

bdg001about 1 month ago

I was using gitdiagram but llms are very bad at generating good error free mermaid code!Thanks buddy! this will be very helpful !!

dangoodmanUTabout 1 month ago

it appears like it's leveraging the docs and learned tokens more than the actual code. For example I don't believe it could achieve that understanding of levelDB without the prior knowledge and extensive material it's probably learned on already

andybakabout 1 month ago

Is there a way to limit the number of exclamation marks in the output?It seems a trifle... overexcited at times.

las_nishabout 1 month ago

Nice project. I need to try this

rtcomsabout 1 month ago

I would be very interested in knowing how did you build this ?

anshulbhideabout 1 month ago

Love this kind of stuff on HN

souhail_devabout 1 month ago

that's amazing, I was looking for that a while ago Thanks

lasarkoljaabout 1 month ago

Can anyone turn nextcloud/server into an easy tutorial

CalChrisabout 1 month ago

Do one for LLVM and I'll definitely look at it.

throwaway290about 1 month ago

You didn't "build an AI". It's more like you wrote a prompt.I wonder why all examples are from projects with great docs already so it doesn't even need to read the actual code.

评论 #43742593 未加载

firesteelrainabout 1 month ago

Can this work with Codeium Enterprise?

saberienceabout 1 month ago

I hate this language: "built an AI", did you train a new model to do this? Or are you in fact calling ChatGPT 4o, or Sonnet 3.7 with some specific prompts?If you trained a model from scratch to do this I would say you "built an AI", but if you're just calling existing models in a loop then you didn't build an AI. You just wrote some prompts and loops and did some RAG. Which isn't building an AI and isn't particularly novel.

评论 #43743398 未加载

mraza007about 1 month ago

Impressive work.With the rise of AI understanding software will become relatively easy

chyueliabout 1 month ago

Great, I'll try it next time, thanks for sharing

lionturtleabout 1 month ago

>:( :3

ryaoabout 1 month ago

I would find this more interesting if it made tutorials out if the Linux, LLVM, OpenZFS and FreeBSD codebases.

评论 #43740152 未加载

评论 #43740141 未加载

评论 #43743972 未加载

istjohnabout 1 month ago

This is neat, but I did find an error in the output pretty quickly. (Disregard the mangled indentation)<pre><code> # Use the Session as a context manager with requests.Session() as s: s.get('https://httpbin.org/cookies/set/contextcookie/abc') response = s.get(url) # ??? print("Cookies sent within 'with' block:", response.json()) </code></pre> <a href="https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Requests/03_session.html" rel="nofollow">https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...</a>

评论 #43744024 未加载

评论 #43744100 未加载

60 comments

bilalqabout 1 month ago

评论 #43743287 未加载

评论 #43746793 未加载

评论 #43764213 未加载

评论 #43741600 未加载

swashbuck1rabout 1 month ago

mooredsabout 1 month ago

评论 #43744625 未加载

manofmanysmilesabout 1 month ago

评论 #43740621 未加载

TheTaytayabout 1 month ago

Woah, this is really neat. My first step for many new libraries is to clone the repo, launch Claude code, and ask it to write good documentation for me. This would save a lot of steps for me!

评论 #43748435 未加载

fforfloabout 1 month ago

评论 #43744660 未加载

Tooabout 1 month ago

chairhairairabout 1 month ago

A company (mutable ai) was acquired by Google last year for essentially doing this but outputting a wiki instead of a tutorial.

评论 #43750652 未加载

评论 #43740451 未加载

gregpr07about 1 month ago

评论 #43741915 未加载

评论 #43743409 未加载

esjeonabout 1 month ago

remoqueteabout 1 month ago

评论 #43750928 未加载

axelr34022 days ago

mattfrommarsabout 1 month ago

WTFYou built in in one afternoon? I need to figure out these mythical abilities.I've thought about this idea few weeks back but could not figure out how to implement it.Amazing job OP

fforfloabout 1 month ago

评论 #43749840 未加载

ameliusabout 1 month ago

I've said this a few times on HN: why don't we use LLMs to generate documentation? But then came the naysayers ...

评论 #43744780 未加载

评论 #43751203 未加载

评论 #43744382 未加载

mvATM99about 1 month ago

citizenpaulabout 1 month ago

This is really cool. One of the best AI things I've seen in the last two years.

wg0about 1 month ago

That's a game changer for a new Open source contributor's onboarding.Put in postgres or redis codebase, get a good understanding and get going to contribute.

评论 #43742364 未加载

Retr0idabout 1 month ago

评论 #43740382 未加载

评论 #43743905 未加载

评论 #43740406 未加载

评论 #43741323 未加载

kaycebasquesabout 1 month ago

评论 #43759012 未加载

potamicabout 1 month ago

Did you measure how much it cost to run it against your examples? Trying to gauge how much it would cost to run this against my repos.

评论 #43743165 未加载

stephantulabout 1 month ago

The dspy tutorial is amazing. I think dspy is super difficult to understand conceptually, but the tutorial explained it really well

theptipabout 1 month ago

评论 #43744742 未加载

badmonsterabout 1 month ago

do you have plans to expand this to include more advanced topics like architecture-level reasoning, refactoring patterns, or onboarding workflows for large-scale repositories?

评论 #43739684 未加载

iamsaitam27 days ago

1899-12-30about 1 month ago

ganesshabout 1 month ago

Does it use the docs in the repository or only the code?

评论 #43743456 未加载

nitinram30 days ago

评论 #43767986 未加载

lummmabout 1 month ago

bionhowardabout 1 month ago

“I built an AI”Looks insideREST API calls

pknerdabout 1 month ago

Interesting..would you like to share some technical details? it did not seem you have used RAG here?

评论 #43743482 未加载

chbkallabout 1 month ago

Love this. These are the kind of AI applications we need which aid our learning and discovery.

zarkenfroodabout 1 month ago

android521about 1 month ago

For anyone doubting AI as pure hype, this is the counter example of its usefulness

评论 #43741563 未加载

lastdongabout 1 month ago

Great stuff, I may try it with a local model. I think the core logic for the final output is all in the nodes.py file, so I guess one can try and tweak the prompts, or create a template system.

touristtamabout 1 month ago

Just need to find one way to integrate into the deployment pipeline and output some markdown (or other format) to send them to what ever your company is using (or simply a live website), I'd say.

thomabout 1 month ago

This is definitely a cromulent idea, although I’ve realised lately that ChatGPT with search turned on is a great balance of tailoring to my exact use case and avoiding hallucinations.

trash_catabout 1 month ago

This is literally what I use AI for. Excellent project.

orsenthilabout 1 month ago

It will be good to integrate a local web server to fire up and read the doc. I use vscode, markdown preview. And it works too. Cool project.

polishdude20about 1 month ago

Is there an easy way to have this visit a private repository? I've got a new codebase to learn and it's behind credentials.

评论 #43767695 未加载

评论 #43744978 未加载

andrewrnabout 1 month ago

This is brilliant. I would make great use of this.

gbraadabout 1 month ago

Interesting, but gawd awful analogy: "like a takeout order app". It tries to be amicable, which feels uncanny.

throwaway314155about 1 month ago

评论 #43741185 未加载

bdg001about 1 month ago

I was using gitdiagram but llms are very bad at generating good error free mermaid code!Thanks buddy! this will be very helpful !!

dangoodmanUTabout 1 month ago

andybakabout 1 month ago

Is there a way to limit the number of exclamation marks in the output?It seems a trifle... overexcited at times.

las_nishabout 1 month ago

Nice project. I need to try this

rtcomsabout 1 month ago

I would be very interested in knowing how did you build this ?

anshulbhideabout 1 month ago

Love this kind of stuff on HN

souhail_devabout 1 month ago

that's amazing, I was looking for that a while ago Thanks

lasarkoljaabout 1 month ago

Can anyone turn nextcloud/server into an easy tutorial

CalChrisabout 1 month ago

Do one for LLVM and I'll definitely look at it.

throwaway290about 1 month ago

You didn't "build an AI". It's more like you wrote a prompt.I wonder why all examples are from projects with great docs already so it doesn't even need to read the actual code.

评论 #43742593 未加载

firesteelrainabout 1 month ago

Can this work with Codeium Enterprise?

saberienceabout 1 month ago

评论 #43743398 未加载

mraza007about 1 month ago

Impressive work.With the rise of AI understanding software will become relatively easy

chyueliabout 1 month ago

Great, I'll try it next time, thanks for sharing

lionturtleabout 1 month ago

>:( :3

ryaoabout 1 month ago

I would find this more interesting if it made tutorials out if the Linux, LLVM, OpenZFS and FreeBSD codebases.