I was inspired by a recent tweet by Andrej Karpathy, as well as my own experience copying and pasting a bunch of html docs into Claude yesterday and bemoaning how long-winded and poorly formatted it was.<p>I’m trying to decide if I should make it into a full-fledged service and completely automate the process of generating the distilled documentation.<p>Problem is that it would cost a lot in API tokens and wouldn’t generate any revenue (plus it would have to be updated as documentation changes significantly). Maybe Anthropic wants to fund it as a public good? Let me know!
Oh! Recently I had the experience of working with someone who was using LLMs to build something using my JS canvas library. The code the LLM was producing for this person was ... sub-optimal. Over-complicated. Not a surprise to me as my library is very niche and the only documentation around it is the docs/lessons I've written myself. So now I'm in the middle of an exercise to write documentation[1] that tries to explains everything (that I can remember) that the library does.<p>The problem is, I've no idea how useful that documentation would be for LLM consumption - does anyone know of an "Idiot's Guide to writing documentation for LLM consumption" so I can review my work to date and improve the docs going forward?<p>[1] - In this branch. I'm writing the documentation in .md files which get converted into .html files (using Sundown) during a build step: <a href="https://github.com/KaliedaRik/Scrawl-canvas/pull/119/files" rel="nofollow">https://github.com/KaliedaRik/Scrawl-canvas/pull/119/files</a>
It’s a cool idea. I’ve wasted a lot of time over the past few months futzing around with beautifulsoup, Playwright and others I forget, or cloning entire repos and trying to figure out exactly which incantations for which build tools are going to get me the built docs I need, all in service of setting them up for retrieval and use by LLMs. Some projects (e.g. Godot, Blender, Django) make it very easy. Others do not (Dagster is giving me headaches at the moment).<p>I would probably prefer to receive unmodified, plain text/md versions (with the heavy lifting done by, e.g., docling, unstructured) than LLM summaries though, since I’d rather produce my own distillations.<p>I would pay for that kind of thing. I think the intersection between ethical scraping and making things machine-readable is fertile ground. For a lot of companies it’s something that can be of great value, but is also non-trivial to do well and unlikely to be a core competency in-house.