TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: OCR image pre-processing resources for beginners?

2 pointsby Curiositryabout 2 years ago
I&#x27;m using Tesseract 5 to do optical character recognition on (typewritten) scanned documents, and the output quality is mediocre, despite decent image quality.<p>Could anyone point me to semi-automated tools for pre-processing scanned pages to improve OCR accuracy?<p>I have run across scantailor-advanced, unpaper, and textcleaner, but the settings for all of them are a bit in depth, and I haven&#x27;t found any beginner-friendly starting point blogposts&#x2F;script for what would be good, reasonable default settings.

no comments

no comments