TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Rescribe: A high quality OCR tool for historic books

84 点作者 dbuxton超过 3 年前

3 条评论

thaumasiotes超过 3 年前
I have the Kindle version of <i>The Seleucid Royal Economy</i> which for obvious reasons includes Greek text.<p>It&#x27;s been OCRed, and the Greek has been mangled beyond belief. Sometimes the OCR will split a single character.<p>No real point to the story, but it feels relevant here. I see Rescribe has already encountered the problem: &quot;In the second step we run the OCR on the preprocessed files, using our specifically trained packages and adapting language and character settings to the document at hand.&quot;<p>(I&#x27;m only complaining to a very small degree. Having a low-quality OCRed ebook available is much better than having no ebook available. And what is normally displayed is the image of the text, not the OCRed nonsense, so it doesn&#x27;t matter that the Greek has been transformed into gibberish until you encounter the odd mid-character word break.)
raybb超过 3 年前
I think the folks at OpenLibrary.org would benefit from something like this.
评论 #29352675 未加载
IshKebab超过 3 年前
Is Tesseract any good yet? Last I heard they were experimenting with deep learning based recognition but before that I&#x27;ve tried it and it didn&#x27;t work at all. Kind of Pocketsphinx levels of rubbish.
评论 #29352987 未加载
评论 #29353587 未加载
评论 #29351985 未加载