I will have to investigate this, I am dreaming of a system that can take a pdf scan of a book as input and produce one or more properly formated (headings, italic, bold, underline, etc) markdown files.
In my tests, LLMs have proved very good at cleaning a raw OCR but they need formating information to get me all the way.