I can't say I've ever wanted to transcribe code from an image. That seems super niche.<p>Perhaps the specific idea is to harvest coding textbooks as training data for LLMs?
has anyone tried feeding the admittedly noisy OCR-ed text -at a document level - to an LLM for making sense? Presumably some of the less capable ones should be quite affordable and accurate at scale as well.