TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Rescribe: A high quality OCR tool for historic books

84 pointsby dbuxtonover 3 years ago

3 comments

thaumasiotesover 3 years ago
I have the Kindle version of <i>The Seleucid Royal Economy</i> which for obvious reasons includes Greek text.<p>It&#x27;s been OCRed, and the Greek has been mangled beyond belief. Sometimes the OCR will split a single character.<p>No real point to the story, but it feels relevant here. I see Rescribe has already encountered the problem: &quot;In the second step we run the OCR on the preprocessed files, using our specifically trained packages and adapting language and character settings to the document at hand.&quot;<p>(I&#x27;m only complaining to a very small degree. Having a low-quality OCRed ebook available is much better than having no ebook available. And what is normally displayed is the image of the text, not the OCRed nonsense, so it doesn&#x27;t matter that the Greek has been transformed into gibberish until you encounter the odd mid-character word break.)
raybbover 3 years ago
I think the folks at OpenLibrary.org would benefit from something like this.
评论 #29352675 未加载
IshKebabover 3 years ago
Is Tesseract any good yet? Last I heard they were experimenting with deep learning based recognition but before that I&#x27;ve tried it and it didn&#x27;t work at all. Kind of Pocketsphinx levels of rubbish.
评论 #29352987 未加载
评论 #29353587 未加载
评论 #29351985 未加载