I'm using Tesseract 5 to do optical character recognition on (typewritten) scanned documents, and the output quality is mediocre, despite decent image quality.<p>Could anyone point me to semi-automated tools for pre-processing scanned pages to improve OCR accuracy?<p>I have run across scantailor-advanced, unpaper, and textcleaner, but the settings for all of them are a bit in depth, and I haven't found any beginner-friendly starting point blogposts/script for what would be good, reasonable default settings.