TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A new project untangles the handwritten texts in the Vatican's collections

80 pointsby fraqedabout 7 years ago

4 comments

WalterBrightabout 7 years ago
Sometimes I think archivists are so obsessed with getting perfect scans and every-pixel-is-precious that scanning books becomes too costly and so never happens.<p>A simple alternative is to just collect some volunteers with iphones and have one person turn pages while the other just clicks the shutter. You could easily do 20 pages a minute, 1200&#x2F;hr, 10000&#x2F;day. I bet those acres of books could be ground through in reasonably good time.<p>Of course, the images would horrify an archivist. But try it yourself with a random book. They&#x27;re quite serviceable. At the very least, one then has a backup in the case of a catastrophe at the library.<p>OCRing them is an entirely separate issue.
评论 #16985137 未加载
anon1253about 7 years ago
Very interesting. Way back my old university was also involved in historical document processing <a href="https:&#x2F;&#x2F;www.rug.nl&#x2F;research&#x2F;portal&#x2F;files&#x2F;40224455&#x2F;Chapter_7.pdf" rel="nofollow">https:&#x2F;&#x2F;www.rug.nl&#x2F;research&#x2F;portal&#x2F;files&#x2F;40224455&#x2F;Chapter_7....</a> they also looked at things like writer identification and trying to automatically date the documents using a wide array of hand crafted features. Curious what would happen with some of the newer deep learning models, but the project has been dead for a while <a href="http:&#x2F;&#x2F;application02.target.rug.nl&#x2F;cgi-bin&#x2F;monkweb?db=All&amp;cmd=scroogle" rel="nofollow">http:&#x2F;&#x2F;application02.target.rug.nl&#x2F;cgi-bin&#x2F;monkweb?db=All&amp;cm...</a> … as these things go
pimlottcabout 7 years ago
I found this part interesting:<p>&gt; In texts transcribed so far, a full one-third of the words contained one or more typos, places where the OCR guessed the wrong letter. [...] Still, the software got 96 percent of all handwritten letters correct.<p>96% correct sounds pretty good but that&#x27;s still multiple errors per sentence! The threshold for truly &quot;error-free&quot; is quite high...
评论 #16982697 未加载
SiempreViernesabout 7 years ago
Sloppy summary: some researcher has trained some NN or whatever to segment and then ocr old handwritten text and hopes to use it on the enormous archive the Vatican has. Apparently because if it&#x27;s not scanned its almost completely useless to &quot;modern scholars&quot;, which I take to mean those historians that only read medieval latin if its printed on a screen...
评论 #16982118 未加载
评论 #16981870 未加载
评论 #16985728 未加载
评论 #16981967 未加载