TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Poetry from dirty OCR

63 点作者 MaysonL大约 2 年前

5 条评论

eliaspro大约 2 年前
This reminds me of the experiment to run paint splatters through OCR and check, whether the result is valid Perl code (spoiler: 93% evaluated just fine).<p><a href="https:&#x2F;&#x2F;www.mcmillen.dev&#x2F;sigbovik&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.mcmillen.dev&#x2F;sigbovik&#x2F;</a>
vintermann大约 2 年前
OCR is hard, but maybe we can make some real progress on it now with modern AI. A context-smart church records handwriting transcriber would be pretty great.
评论 #35279050 未加载
评论 #35284791 未加载
评论 #35278497 未加载
version_five大约 2 年前
<p><pre><code> I&#x27;ve poured over ((ok, grepped) ~500GB of Chroincling America data to find lines that meet my low standard for nonsene, basically ones that match egrep &quot;[^a-zA-Z0-9 ]{3,}&quot; </code></pre> I&#x27;m super curious to know fast this was. grep is generally very fast and this should be doable on a normal computer, though it might take a little while
评论 #35283445 未加载
评论 #35277646 未加载
chaps大约 2 年前
Spent a load of time doing OCR and dealing with its failures... this is absolutely wonderful, thanks for sharing!
BubbleRings大约 2 年前
Yes, sir, we got a parrot.