TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Can you read this cursive handwriting? The National Archives wants your help

257 点作者 lemonberry4 个月前

37 条评论

geuis4 个月前
It&#x27;s a really interesting project. But boy do they make it hard to participate.<p>* Article doesn&#x27;t provide a direct link to the topic mission<p>* Signup is pretty easy. Well organized and even gently requires you to have two forms of 2FA.<p>* Sign up complete. Go back to the primary page and try to find the mission. A little buried but not too deep.<p>* Notice I&#x27;m not signed in. Ok, let&#x27;s do that. Now I&#x27;m back on the main page and navigate back. Find the first document and open it. Really interesting to scan through the doc and to read. People back then generally had really nice handwriting.<p>* Ok, what next, how do I transcribe? ... ? Oh it says I&#x27;m not logged in again. Fine, click the link and...<p>* I&#x27;m logged in and directed back to the main page, again.<p>Look, this is an interesting project and I&#x27;d love to spend my spare cycles to help out. But they really need to clean up this process.<p>Volunteers shouldn&#x27;t have to jump through kinda poorly designed interfaces to help out.
评论 #42746419 未加载
评论 #42749207 未加载
评论 #42753034 未加载
评论 #42755071 未加载
demosthanos4 个月前
Before commenting asking about why they don&#x27;t just use LLMs, please note that the article specifically calls out that they do, but it&#x27;s not always a viable solution:<p>&gt; The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.<p>The document at the top is likely an especially <i>easy</i> document to read precisely because it&#x27;s meant to be the hook to get people to sign up and get started. It isn&#x27;t going to be representative of the full breadth of documents that the National Archives want people to go through.
评论 #42745566 未加载
评论 #42748493 未加载
评论 #42745647 未加载
评论 #42753163 未加载
评论 #42747331 未加载
Unearned51614 个月前
Ok I did one letter, from a woman in 1814 writing to James Monroe (then Secretary of State) asking for a passport to go to Scotland to get her late brother&#x27;s property. What a trip! So enjoyable to get into the flow once you&#x27;ve &quot;synchronized&quot; with the persons handwriting. Furthermore, due to the fact that you&#x27;re reading and re-writing word for word of whatever you&#x27;re transcribing, the stories you end up reading have tremendous memory-stick. This is not surprising, considering that you are dedicating an inordinate amount of time per page, but it&#x27;s a welcome side effect when you try and recollect.
评论 #42746103 未加载
评论 #42748724 未加载
评论 #42747339 未加载
seletskiy4 个月前
To tptacek and other guys who seem to have unwavering trust in OCRs&#x2F;LLMs, as well as to opposite party who think that technology is not there yet — you are all partially right, but somehow fail to hear each other while also spending time on baseless arguing instead of factual examples and attempts to find common truth.<p>Can it be used to greatly simplify efforts by getting through boilerplate? — Yes.<p>Should the result be reviewed and proof-read by human? — Also yes.<p>---<p>Here subtle one: <a href="https:&#x2F;&#x2F;catalog.archives.gov&#x2F;id&#x2F;34384201?objectPage=40" rel="nofollow">https:&#x2F;&#x2F;catalog.archives.gov&#x2F;id&#x2F;34384201?objectPage=40</a><p>Here is (one of) transcripts made by `o1-pro`:<p><pre><code> (2) …and I don’t know whether it can be reset for a date in December or not. Cornell seemed anxious that it should not come up too close to Christmas, and of course new suspicion [would be aroused?] [about?] him. I will take this up with the Judge as soon as I can get rid of the brief. Meanwhile I would like to know whether there is anything else in which I can be useful to you, since it behooves me in ways of uncomfortable relations with the present management. Are you going East in December? Has any word come from Hagerman? Were there any noteworthy developments at the hearings on the [Teapot?] trial? I have no inclination yet whether Wheeler will be wanted in Washington, but the chances are that he will not. With regards to all the brethren and [flock?], I am very sincerely yours, George A. H. Fraser </code></pre> I&#x27;m not native english speaker, but even I can read where it is wrong. I&#x27;ll leave it to be an excercise for the reader to find out mistakes, but it is certainly not a Teapot trial.<p>Somehow GPT-4o performs better on this example and fails only on &quot;New Mexican practise&quot; part.
评论 #42748999 未加载
评论 #42749010 未加载
评论 #42747992 未加载
评论 #42788787 未加载
Unearned51614 个月前
cheers! I was looking for something semi productive to sink a Friday night into<p>on a more serious note, working through a transcription project for letters and journals that nobody has touched since they&#x27;ve been archived is such a wonderful feeling. Aside from being in front of the physical document itself, your degree of separation from the writer and point is time is vanishingly small!<p>I always like to observe when they cross something out or make a mistake and think about what could have caused that. Did a friend pass by the door and scare them? Did they get distracted looking out the window? It&#x27;s all so close and yet so far away :)
saagarjha4 个月前
Seems like something that some of those big AI companies that are desperately starved of training material could chip in on, no? Actually do something for the public good, spend a few cents of that VC money, get some high-quality training data out of it?
ChrisMarshallNY4 个月前
They should ask a medical school for help ;)<p>My family is Ivy-League, all the way, and has the worst goddamn cursive writing I&#x27;ve ever seen. It can take me an hour to read a Christmas card from my sister.
评论 #42748872 未加载
Decabytes4 个月前
I’m interested to give this a go because I want to practice reading cursive. I do a lot of longhand writing including writing all my notes in cursive. It’s exciting to watch my binding fill up with all sorts of different subjects!<p>I like to write in cursive for a few reasons<p>1. I find it makes my hand cramp less 2. It offers some shallow privacy in public 3. I don’t want to lose the skill 4. It’s fun!
评论 #42745715 未加载
iambateman4 个月前
This is all very cool so I’m not trying to be dismissive. In a lot of ways, giving a hobby out as a way to participate in the national archives is an end in itself.<p>But…computers can definitely do this way better, right?
评论 #42745490 未加载
评论 #42745558 未加载
评论 #42745517 未加载
评论 #42745536 未加载
zabzonk4 个月前
After using a keyboard for circa 50 years, I can&#x27;t read my own handwriting. I can&#x27;t even give a reproduceable signature.
评论 #42748300 未加载
评论 #42745568 未加载
tptacek4 个月前
Isn&#x27;t this like a bread-and-butter AI task?<p><i>“The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.”</i> <i>“The said James Lambert, on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana, at the November Term of said Court [1841], it being a court of record created by the laws of Indiana, and made oath that on the 25th day of March 1842 he will be eighty‐five years old; that he was born in the State of Maryland; that he is now a resident of [said] county and has been for the [27] years last past; that he has lived in Virginia, Maryland, [and Pennsylvania]; that…”</i><p>These kinds of problems, matching up cursive to actual text, would seem to play to the absolute best strengths of an LLM, given how much basic language structure the models encode.
评论 #42745509 未加载
评论 #42745552 未加载
jll294 个月前
Prompt:<p><pre><code> You are a paleologist specializing in analysis of cursive handwriting; tell me what the following text says: (pasting the picture). </code></pre> Output:<p><pre><code> The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America. The said James Lambert this day personally appeared in the Probate Court of the County of Dearborn in the state of Indiana and at the November Term of said court (1841), it being a court for claims created by the laws of Indiana and makes oath that: &quot;On the 25th day of March 1842, he will be eighty-five years old; that he was born in the state of Maryland; that he is now a resident of said county and has been for the 27 years last past; that he has lived in Virginia, Maryland, Pennsylvania ...&quot;</code></pre>
评论 #42747781 未加载
评论 #42748149 未加载
MarkusWandel4 个月前
Curious, how hard is the sample in the article meant to be? I grew up (in the 1970s) in a world in which cursive still ruled. But the variant that we were taught in school was already considerably evolved from the one used by my grandparents, and those were modern compared to the archaic German script ( <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;S%C3%BCtterlin" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;S%C3%BCtterlin</a> ) so I&#x27;ve never thought of myself as good at reading cursive. And of course haven&#x27;t written (or read) much of it in the decades since.<p>It took about one minute to decipher the first sentence in the sample. Is that considered good these days?
评论 #42750086 未加载
评论 #42749908 未加载
评论 #42749789 未加载
评论 #42749792 未加载
评论 #42752855 未加载
gdubs4 个月前
FWIW since so many people here seem set on the idea that cursive is archaic &#x2F; useless today, Montessori schools still teach cursive before print because the flowing letters are easier for kids and more similar to drawing, and all the exercises they do around letter tracing.<p>The result is that kids in Montessori learn to read faster and earlier. (They&#x27;re usually writing in cursive <i>first</i>, which gives them a foundation of the letters and their phonetic sounds, before they begin reading exercises in earnest.)
评论 #42749476 未加载
评论 #42752228 未加载
tkgally4 个月前
This reminded me of something the historian Megan Marshall wrote in the introduction to her book <i>The Peabody Sisters: Three Women Who Ignited American Romanticism</i> (2005):<p>“I became expert in deciphering the sisters’ handwriting, and that of their ancestors, parents, and friends. Each era and each correspondent presented different challenges. Some hands were sprawling, some spindly, some cramped; <i>t</i>’s went uncrossed at the ends of words, and <i>f</i>’s and <i>s</i>’s were interchanged; spelling, capitalization, and punctuation could be erratic or idiosyncratic. Often, to save paper and postage, the sisters turned a single sheet ninety degrees and wrote back across a page already covered with handwriting. I learned to be especially attentive to these cross-written lines, in which the sisters invariably confided their deepest feelings in the last hurried moments of closing a letter. Here I would find the urgent personal message that had been put off for the sake of dispensing news or settling business. In one such postscript, I discovered Elizabeth’s account of a conversation with Horace Mann in which the two spoke frankly of their love for each other and finally settled on what it meant.”<p>A photograph of a letter with cross-writing is here:<p><a href="https:&#x2F;&#x2F;www.masshist.org&#x2F;database&#x2F;1774" rel="nofollow">https:&#x2F;&#x2F;www.masshist.org&#x2F;database&#x2F;1774</a><p>Marshall wrote more in an article for <i>Slate</i>:<p><a href="https:&#x2F;&#x2F;slate.com&#x2F;news-and-politics&#x2F;2005&#x2F;05&#x2F;reading-the-peabody-sisters-letters.html" rel="nofollow">https:&#x2F;&#x2F;slate.com&#x2F;news-and-politics&#x2F;2005&#x2F;05&#x2F;reading-the-peab...</a>
评论 #42748013 未加载
jez4 个月前
The handwriting in some of these snippets, while sometimes difficult to read for one reason or another, is nonetheless beautiful: did everyone who wrote have such great handwriting back then?<p>I&#x27;m looking at the piece in the Instagram post linked by the page, which begins, &quot;honor of holding in their service&quot;. The lines are so straight, the letters are so uniform!
评论 #42745843 未加载
评论 #42745583 未加载
评论 #42745807 未加载
rasengan04 个月前
Thanks for posting this collective effort.<p>Pretty easy to get started after signing up at login.gov<p>then <a href="https:&#x2F;&#x2F;www.archives.gov&#x2F;citizen-archivist&#x2F;get-started-transcribing" rel="nofollow">https:&#x2F;&#x2F;www.archives.gov&#x2F;citizen-archivist&#x2F;get-started-trans...</a> with video: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=dwQ5pEWWFY8" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=dwQ5pEWWFY8</a><p>The support forum is <a href="https:&#x2F;&#x2F;historyhub.history.gov&#x2F;citizen_archivists&#x2F;" rel="nofollow">https:&#x2F;&#x2F;historyhub.history.gov&#x2F;citizen_archivists&#x2F;</a> to ask questions
electricant4 个月前
Today I learned that in the us children are not taught cursive handwriting. This is rather absurd to me. How are they supposed to write?
评论 #42747303 未加载
评论 #42750008 未加载
评论 #42750179 未加载
评论 #42748529 未加载
评论 #42748692 未加载
Baeocystin4 个月前
Funnily enough, there have been a few times over the past couple of years I&#x27;ve been asked by younger co-workers to read something for them that was written in cursive. I hadn&#x27;t really realized it had become such a (comparatively) rare skill. This fact is making me feel older than my actual 50th birthday did!
评论 #42749914 未加载
评论 #42745468 未加载
brenainn4 个月前
The Australian War Memorial has a volunteer program for transcribing old letters and diaries and such: <a href="https:&#x2F;&#x2F;transcribe.awm.gov.au&#x2F;" rel="nofollow">https:&#x2F;&#x2F;transcribe.awm.gov.au&#x2F;</a><p>I gave it a go but it was too hard for me! I write in cursive but I found most of it illegible.
jb19914 个月前
&gt; particularly for Americans who never learned cursive in school.<p>American schools don’t teach it anymore?!
评论 #42745618 未加载
评论 #42745711 未加载
c0brac0bra4 个月前
I have a family heirloom civil war journal and much of it is unfortunately near undecipherable cursive writing.<p>It would be great if this would eventually develop into some kind of set of open models that would work on content like this.
kopirgan4 个月前
Is that true?! US kids don&#x27;t learn cursive? How do they write?!
评论 #42748972 未加载
评论 #42748303 未加载
评论 #42748256 未加载
评论 #42748749 未加载
评论 #42748808 未加载
评论 #42748231 未加载
jedberg4 个月前
They should hire a bunch of teachers to do this over the summer! Every teacher I know is an expert at reading terrible handwriting.
euribates4 个月前
Americans... consider reading handwriting a superpower?<p>This explains a lot
poulpy1234 个月前
My brother in history, I can&#x27;t even read mine
epgui4 个月前
An army of pharmacists ought to do the trick!
评论 #42750183 未加载
anonymous_3794 个月前
Why did people use to write like this?
评论 #42750673 未加载
peter_retief4 个月前
How does one actually sign up?
评论 #42749971 未加载
musicale4 个月前
It might be nice for people to be able to actually read the documents in the National Archives rather than relying on a transcription or a mobile app.<p>I wonder if they&#x27;ve considered making a simple tutorial on how to read cursive? It&#x27;s not that hard if you can already read printed English. And of course you can practice on documents in the National Archives.<p>It&#x27;s exciting and fun to learn to read an unfamiliar script, like the runes on the cover of The Hobbit ... or the engraving-style cursive of the US Constitution.
评论 #42745485 未加载
评论 #42745518 未加载
评论 #42745475 未加载
madmask4 个月前
I still write like that
jncfhnb4 个月前
I don’t think I believe that OCR can’t do it but random humans can<p>OCR is VERY good
评论 #42745581 未加载
评论 #42745567 未加载
评论 #42745507 未加载
评论 #42745465 未加载
SCPlayz70004 个月前
This is cool.
riffic4 个月前
can this be mechanically turked?
paulg22224 个月前
I can even write cursive. Take that, Zoomers.
Over2Chars4 个月前
It says &quot;The following is the dedication of James Lambert a soldier of the Revolutionary wars with the Americas.&quot;<p>blah blah blah
评论 #42745481 未加载
评论 #42745477 未加载
myth_drannon4 个月前
Why do they need volonteers to manually do it? Open AI models like Microsoft&#x27;s TrOCR are very effective for handwritten English