Hi HN! I’m Omar from Mutable.ai. We want to introduce Auto Wiki (<a href="https://wiki.mutable.ai/">https://wiki.mutable.ai/</a>), which lets you generate a Wiki-style website to document your codebase. Citations link to code, with clickable references to each line of code being discussed. Here are some examples of popular projects:<p>React: <a href="https://wiki.mutable.ai/facebook/react">https://wiki.mutable.ai/facebook/react</a><p>Ollama <a href="https://wiki.mutable.ai/jmorganca/ollama">https://wiki.mutable.ai/jmorganca/ollama</a><p>D3: <a href="https://wiki.mutable.ai/d3/d3">https://wiki.mutable.ai/d3/d3</a><p>Terraform: <a href="https://wiki.mutable.ai/hashicorp/terraform">https://wiki.mutable.ai/hashicorp/terraform</a><p>Bitcoin: <a href="https://wiki.mutable.ai/bitcoin/bitcoin">https://wiki.mutable.ai/bitcoin/bitcoin</a><p>Mastodon: <a href="https://wiki.mutable.ai/mastodon/mastodon">https://wiki.mutable.ai/mastodon/mastodon</a><p>Auto Wiki makes it easy to see at a high level what a codebase is doing and how the work is divided. In some cases we’ve identified entire obsolete sections of codebases by seeing a section for code that was no longer important. Auto Wiki relies on our citations system which cuts back on hallucinations. The citations link to a precise reference or definition which means the wiki generation is grounded on the basis of the code being cited rather than free form generation.<p>We’ve run Auto Wiki on the most popular 1,000 repos on GitHub. If you want us to generate a wiki of a public repo for you, just comment in this thread! The wikis take time to generate as we are still ramping up our capacity, but I’ll reply that we’ve launched the process and then come back with a link to your wiki when it’s ready.<p>For private repos, you can use our app (<a href="https://app.mutable.ai">https://app.mutable.ai</a>) to generate wikis. We also offer private deployments with our own model for enterprise customers; you can ping us at info@mutable.ai. Anyone that already has access to a repo through GitHub will be able to view the wiki, only the person generating the wikis needs to pay to create them. Pricing starts at $4 and ramps up by $2 increments depending on how large your repo is.<p>In an upcoming version of Auto Wiki, we’ll include other sources of information relevant to your code and generate architectural diagrams.<p>Please check out Auto Wiki and let us know your thoughts! Thank you!
Cool concept. Right off the bat I see some big issues with the generated CPython documentation:<p>> This provides a register-based virtual machine that executes the bytecode through simple opcodes.<p>Python's VM is stack-based, not register-based.<p>> The tiered interpreter in …/ceval.c can compile bytecode sequences into "traces" of optimized microoperations.<p>No such functionality exists in CPython, as far as I know.<p>> The dispatch loop switches on opcodes, calling functions to manipulate the operand stack. It implements stack manipulation with macros.<p>No it doesn't. If you look at the bytecode interpreter, it's full of plain old statements like `stack_pointer += 1;`.<p>> The tiered interpreter is entered from a label. It compiles the bytecode sequence into a trace of "micro-operations" stored in the code object. These micro-ops are then executed in a tight loop in the trace for faster interpretation.<p>As mentioned above, this seems to be a complete hallucination.<p>> During initialization, …/pylifecycle.c performs several important steps: [...] It creates the main interpreter object and thread<p>No, the code in this file creates an internal thread <i>state</i> object, corresponding to the already-running thread that calls it.<p>> References: Python/clinic/import.c.h The module implements finding and loading modules from the file system and cached bytecode.<p>This is kinda sorta technically correct, but the description never mentions the crucial fact that most of this C code only exists to bootstrap and support the <i>real</i> import machinery, which is written in Python, not C. (Also, the listed source file is the wrong one: it just contains auto-generated function <i>wrappers</i>, not the actual implementations.)<p>> Core data structure modules like …/arraymodule.c provide efficient implementations of homogeneous multidimensional arrays<p>Python's built-in array module provides only one-dimensional arrays.<p>And so on.
That’s nice but the name is confusing: it’s not generating a wiki at all, but a documentation website with a Wikipedia-like theme. Wikis are collaborative websites; Wikipedia is only one of them.
Reading these wikis makes me feel we need to invent some visual convention to indicate AI-generated text. Like a particular color or font. This would make it so people don't feel cheated after they realize they just spent several minutes trying to make sense of something churned out by an LLM. (I mean this as a voluntary design enhancement for sites that want to be nice, of course people can always cheat.)
I think this falls into a common mistake people make about documentation. Good documentation doesn't explain what the code does, it explains why the code is written the way it is, the constraints that caused this decision to be made and even alternatives not considered. You cant really guess those things by looking a code. I'm a fan of ADRs for that reason.<p>Honestly this looks overly verbose to me, a common LLM problem. The mistakes others cite, are also pretty concerning.<p><a href="https://adr.github.io/" rel="nofollow">https://adr.github.io/</a>
I'd love to see the wiki generated for a less already-documented example. These high-profile projects are good demos and the results look compelling (I checked out AutoGPT's and NeoVim's), but these projects already have a ton of documentation that helps the model substantially. What are the smaller projects where it has to generate documentation from code (and not necessarily well-commented code) rather than existing documentation?
Super cool. When I think about accelerating teams while maintaining quality/culture, I think about the adage "if you want someone to do something, make it easy."<p>Maintaining great READMEs, documentation, onboarding docs, etc, is a lot of work. If Auto Wiki can make this substantially easier, then I think it could flip the calculus and make it much more common for teams to invest in these artifacts. Especially for the millions of internal, unloved repos that actually hold an org together.
The only thing I see that this adds over existing docs-to-HTML tooling is that it uses a wikipedia-inspired theme.<p>Meanwhile on the negative side, it adds hallucinations. You say you "cut back" on them but as teraflop's comment shows, it still has plenty.<p>BTW: even the Mastodon link from your OP says "wiki not found" for me.
Would be great to see for <a href="https://github.com/symfony/symfony">https://github.com/symfony/symfony</a>, thanks! As that's a monorepo it may provide a challenge to the tool.
Does it parse Julia files? I am having trouble with generating the wiki for a Julia repository, what surprised me was that it could parse and understand .tex files! Looks promising.
FYI an update from us: We're moving our authentication system to wiki.mutable.ai so you can generate them for private wikis without needing to go through app.mutable.ai.
[Edit: Apparently I’m reviewing the wrong product; see replies.]<p>I tried the app version on one of my old repos. It’s a somewhat challenging test case because there are few comments and parts of the code are incomplete, though I’d say the naming convention is pretty good. The app suggested the question “What is the purpose of the ‘safemode-ui-hook.m’ file?” I accepted the suggestion, and the output was… completely wrong.<p>I’m not surprised it guessed the purpose wrong; even a human would need some context to understand what’s going on in that particular file, though of course the AI did worse by being confidently wrong rather than saying it didn’t know. But the AI also made specific claims that could be seen as wrong just by reading the file. It claimed the file “defines a SUBSTITUTE_safemodeUIHook C struct” when neither that struct name nor anything like it appears anywhere in the file. The name seems to just be mashed together from the repo name and file name.<p>Which makes me wonder, did the AI even see the content of the file? Is it pre-summarized somehow in a way that makes it know very little about the file? Or did the AI see it in full, but hallucinate anyway?
The Bitcoin and Mastadon links don't seem to be working! (wiki not found)<p>Would love to see this for Godot (<a href="https://github.com/godotengine/godot">https://github.com/godotengine/godot</a>). Maybe Maplibre too (<a href="https://github.com/maplibre/maplibre-native">https://github.com/maplibre/maplibre-native</a>)!
I'll go ahead and put in a request for my own repo: Eiim/Chokistream<p>In the meantime, I have a different bit of feedback: the categories don't make much sense to me. I can't find a consistent theme in "Tooling", Bun isn't really a frontend library (although it has frontend components like a bundler), I don't know much about Urbit but it doesn't look like it belongs in "Crypto" (just a P2P network with a crypto-adjacent userbase), iptv-org/iptv doesn't seem to make sense in Education, etc.<p>Also, a number of the links in the Bun page (the ones not in monospace) are 404s. I don't see those types of links on other pages so maybe a bug that was fixed but not backported?<p>Edit: It'd also be nice if the search bar could just search for repo name instead of having to remember the associated GH user
For a wilder idea of what you can do with GitHub and wikis see <a href="https://speedrun.cc" rel="nofollow">https://speedrun.cc</a> No AI, so the wikis need to be hand created, but being able to build tools and UIs right into your GitHub documentation is a powerful concept.
Let's see how it fares against a Nix flake: <a href="https://github.com/hyprland-community/hyprland-nix">https://github.com/hyprland-community/hyprland-nix</a>
How are you going to handle this scenario:<p>- Person reads your auto wiki explanation of some part of a codebase.<p>- The explanation is incorrect.<p>- The person, believing your explanation as authoritative, complains to the developers of the codebase. Maybe opens an issue on an open-source project, posts on a discord or the like.<p>- The maintainers now have to deal with this misinformation adding overhead to their workload.<p>As someone who has helped others who were led astray by ChatGPT, this setup adds a ton of mental baggage to the person’s ask for help. They now have “But ChatGPT said…” to contradict the actually correct thing that you are trying to teach them.
As long as this is happening, might as well try some of my favorites: <a href="https://github.com/wasm3/wasm3">https://github.com/wasm3/wasm3</a>, <a href="https://github.com/WebAssembly/wabt">https://github.com/WebAssembly/wabt</a>, <a href="https://github.com/bytecodealliance/wasmtime">https://github.com/bytecodealliance/wasmtime</a>
I'd quite like some more high level documentation for the Matrix JS SDK (<a href="https://github.com/matrix-org/matrix-js-sdk">https://github.com/matrix-org/matrix-js-sdk</a>). I've been looking at it for quite some time and still don't understand how timelines work.
Would bw fascinating if your tool could bring something useful to light.
When I click the link <a href="https://wiki.mutable.ai/bitcoin/bitcoin">https://wiki.mutable.ai/bitcoin/bitcoin</a> from your post it says that the wiki doesn’t exist yet<p>Then when I clicked again it loaded.<p>Then I clicked the one for D3 and it said the same. And I clicked it again and it still said the same.<p>Is this some kind of weird manifestation of a DB conn error or something?
Love the idea! I will try it on my own repo!<p>As an aside, I've been thinking of creating an auto-wiki for game lore based on what AI npcs say, i.e. convert their hallucinations into canon.<p>How was your experience of taking unstructured text (though code is more structured) and making it into wikis?<p>How difficult is to have it do incremental updates vs re-create it all?
Nice! I’d be interested to see how it handles <a href="https://github.com/rosco-m68k/rosco_m68k">https://github.com/rosco-m68k/rosco_m68k</a> , it’s a mixed software / hardware repo, with a lot of code in assembler and C (for an old platform). Might be a challenge?
Not sure if too late, but I would love to see this applied to the docassemble repo- <a href="https://github.com/jhpyle/docassemble">https://github.com/jhpyle/docassemble</a>.<p>Looks fantastic!
I dont like this. What i would like is somekind of virtual layer on top of the codebase which describes the code just a little better. But only by changing variable names, code build up. Not by adding comments. Because good code doesnt need to have comments everywhere.
I'd quite like to see this applied to the FoundationDB repository - <a href="https://github.com/apple/foundationdb">https://github.com/apple/foundationdb</a>
Thanks!
Please do stevage/map-gl-utils<p>And turfjs/turf<p>Feedback: it's confusing that you're using the word wiki. I guess you mean, the style is similar to Wikipedia? But otherwise the concept of a wiki, an editable set of interconnected pages, seems irrelevant and just confusing here?
Wow, looks nice! I almost felt like I could understand Bitcoins code xD<p>Could you do Appwrite? <a href="https://github.com/appwrite/appwrite">https://github.com/appwrite/appwrite</a><p>I'm not affiliated to them, just wanted to get started hacking it.
I'd love to see what it can do with <a href="https://github.com/microsoft/debugpy">https://github.com/microsoft/debugpy</a> - and especially how it would handle vendored dependencies.
Trying to use <a href="https://app.mutable.ai/Issung/GChan">https://app.mutable.ai/Issung/GChan</a>
Error: "Encounter an error with retriving your quota".
Should all the repos in the “Explore” section already be generated? I clicked on apple/swift from the Languages tab and got “Wiki not found”, which isn’t what I expected to see
I would like to see how it performs on a lesser known project with less stars, these are all well known projects which would exist in the training data in blog posts etc.
At least you have a big red flag of projects using that and you can keep your distance<p>Why I'm saying that? I use GitHub Copilot on a daily basis and it utterly fucks us over on a daily basis as well. You have to be really careful to trust the generated code. Especially boilerplate code is hard to look through. So I wouldn't trust any auto-generated anything that wasn't manually verified (which I argue is more error prone than writing it manually in the first place). And then there's always the argument of "document why you did it like that", not "document what the code does". Of the former, you'll have no records in your code.
Some fun hard cases<p><pre><code> rdacomp/CollapseOS A Forth operating system for 8-bit computers
Co-dfns/Co-dfns A compiler in APL
Xe/TempleOS A very special operating system</code></pre>
Requesting <a href="https://github.com/lobsters/lobsters">https://github.com/lobsters/lobsters</a> as I'm going through that codebase and would be able to provide feedback. cheers<p>ps. just gonna second everyone else who's saying being able to edit out incorrect data is very important, otherwise people are gonna be weary of reading repos they aren't already familiar with.