TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How I OCR hundreds of hours of video (2011)

125 点作者 joe_bleau将近 11 年前

6 条评论

samirmenon将近 11 年前
This is awesome. Who cares that it&#x27;s a little cumbersome - it works!<p>On a more national scale, could something like this be done for Congress? C-SPAN already does most of the hard work of filming and uploading to the web, so perhaps it won&#x27;t be too difficult. I think it would certainly attract a lot of interest... maybe I&#x27;ll give it a go.
评论 #8161409 未加载
waldoj将近 11 年前
Here&#x27;s the code on GitHub: <a href="https://github.com/openva/video-indexer" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;openva&#x2F;video-indexer</a> It&#x27;s terrible (I wrote it for a very narrow use case, and only run it ~200 times each year), but it&#x27;s enough to get the idea.
评论 #8161695 未加载
PeterisP将近 11 年前
His OCR errors (Del. Jennifer L. McClellan -&gt; Del. Jennifer L i\1cCie1ian) look like something that would be easily fixable at the right spot - the dictionaries and language models used by Tesseract.<p>While a spellchecker might fix Jenn1fer -&gt; Jennifer, at the OCR stage there is much more information to do it properly; but it obviously doesn&#x27;t know that McClellan is valid word and thus a much more likely alternative than i\1cCie1ian, and it needs to be told that. The list of speakers on those videos is limited, and their surnames can be added to the appropriate dictionaries to improve their recognition.
robinhoodexe将近 11 年前
<a href="http://webcache.googleusercontent.com/search?q=cache:http://waldo.jaquith.org/blog/2011/02/ocr-video/" rel="nofollow">http:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:http:&#x2F;&#x2F;...</a><p>Google cache of the site if it&#x27;s unavailable (I&#x27;m getting a database error).
评论 #8160891 未加载
burnte将近 11 年前
I would think the first few steps could be combined into one, faster step by using Handbrake to rip DVDs directly to MP4. But I also don&#x27;t see why that stage takes hours on his machine, even on my 2006 rig it took less than the playtime of the DVD.
评论 #8160907 未加载
MisterNegative将近 11 年前
The title is very misleading for me, I expected magic but it was kind of disappointing. They don&#x27;t even OCR actual video, instead they just take a few screenshots.
评论 #8161404 未加载