科技回声

This reminds me of the experiment to run paint splatters through OCR and check, whether the result is valid Perl code (spoiler: 93% evaluated just fine).<p><a href="https://www.mcmillen.dev/sigbovik/" rel="nofollow">https://www.mcmillen.dev/sigbovik/</a>

OCR is hard, but maybe we can make some real progress on it now with modern AI. A context-smart church records handwriting transcriber would be pretty great.

<p><pre><code> I've poured over ((ok, grepped) ~500GB of Chroincling America data to find lines that meet my low standard for nonsene, basically ones that match egrep "[^a-zA-Z0-9 ]{3,}" </code></pre> I'm super curious to know fast this was. grep is generally very fast and this should be doable on a normal computer, though it might take a little while

Spent a load of time doing OCR and dealing with its failures... this is absolutely wonderful, thanks for sharing!

Yes, sir, we got a parrot.

OCR is hard, but maybe we can make some real progress on it now with modern AI. A context-smart church records handwriting transcriber would be pretty great.

Spent a load of time doing OCR and dealing with its failures... this is absolutely wonderful, thanks for sharing!

Yes, sir, we got a parrot.

Poetry from dirty OCR

5 条评论

Poetry from dirty OCR

5 条评论