TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OCR by uploading images to Google Docs

66 点作者 gintas超过 14 年前

10 条评论

anty超过 14 年前
In case anyone wonders: I tried if Google could solve its own captchas. It can, if each character is separated, but once they overlap, like they usually do, it doesn't work.
nodata超过 14 年前
Does anyone know if this uses Google's open source tesseract-ocr software?
评论 #1942800 未加载
CWuestefeld超过 14 年前
I find it tremendously frustrating that so many people are creating this problem for themselves.<p>Anything that needs to be data should be data, not images. Except for some very specific cases, you're not doing anybody any favors by outputting PDF. That format is a data black hole. It allows you to transmit very well-formatted output, but it absolutely <i>stops</i> you from reliably <i>using</i> anything in that content.<p>I beg you all: if it's anything that contains data, or really, if it's anything for which layout and formatting is not absolutely critical, please don't use PDF. Send data as data.
评论 #1942498 未加载
评论 #1943469 未加载
评论 #1942464 未加载
ylem超过 14 年前
Has anyone checked to see if this works with Japanese, Korean, or Chinese? What about Arabic or Hindi? This would shed some light on whether it's likely to be tesseract or ocrpus....
joakin超过 14 年前
Wow I just tested with an image, and you get a GDoc with the image on top and the OCRed text in the bottom.<p>Pretty cool.<p>I wonder what are they using for Google Goggles and this
Estragon超过 14 年前
Incidentally, I noticed that if you try to use tesseract on an image taken from a Google Books page, you get terrible OCR accuracy. Anyone know why that is?
评论 #1942638 未加载
Tichy超过 14 年前
Is there an API by any chance?
评论 #1942472 未加载
mikecane超过 14 年前
G1ver whar OCR locks lice in g00gLe ePubs in g0og1e Buuks, th1s w111 du we11.
trezor超过 14 年前
Trying to improve some scanned forms I have, I got an average of 5 characters per page recognized. Also form formatting recognized as "1 1 1 1 1 1 1 1 1 1 1 1 1".<p>I may not rely entirely on google docs for my OCR needs in future ;)
评论 #1942792 未加载
rorrr超过 14 年前
I wonder what's stronger - google OCR or google captcha?