TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Image text recognition APIs showdown

193 点作者 mohi13超过 7 年前

22 条评论

jedberg超过 7 年前
I&#x27;ve said this many times: Google&#x27;s AI services are superior in every way. They have better results and are easier to use.<p>Amazon is the master of the &quot;good enough&quot;. Their service works well enough that you can check the box that it exists and then point it at all that data you already have in AWS. And that&#x27;s all that most everyone needs.<p>If you are using AI and your competitors aren&#x27;t, it doesn&#x27;t really matter all that much how good the AI is -- you&#x27;re gonna do better and be more efficient.<p>It&#x27;s only after <i>everyone</i> is using AI that it will start to matter how good your particular implementation is. Right now we&#x27;re at the stage that any implantation is better than none.
评论 #15962530 未加载
评论 #15962622 未加载
评论 #15963071 未加载
评论 #15962406 未加载
yegle超过 7 年前
There&#x27;s a picture which Google&#x27;s Vision API returned Goo\u011fle<p>I was wondering why there&#x27;s a unicode char in the middle:<p><pre><code> In [5]: c = &#x27;\u011f&#x27; In [6]: c Out[6]: &#x27;ğ&#x27; </code></pre> This does closely resemble the characters in the picture :-)
评论 #15962372 未加载
评论 #15962039 未加载
评论 #15961948 未加载
ckuhl超过 7 年前
I find it interesting that it looks like in at least one instance, Microsoft CS recognized the text upside down.<p><pre><code> Payloads -&gt; speohed</code></pre>
评论 #15962440 未加载
评论 #15962222 未加载
eastendguy超过 7 年前
You can compare Google vs Microsoft vs OCR.space without signing up here:<p><a href="https:&#x2F;&#x2F;ocr.space&#x2F;compare-ocr-software" rel="nofollow">https:&#x2F;&#x2F;ocr.space&#x2F;compare-ocr-software</a><p>OCR.space is a &quot;good enough&quot; option for many projects. It has a very generous free tier of 25,000 free conversions&#x2F;month <i>per IP address</i> (Google only 1000&#x2F;month per account). In my tests it performed not as good as Google, but good enough for many applications (much better than Tesseract).
Isamu超过 7 年前
Don&#x27;t know where the assessment &quot;Msft, Google APIs accuracy still below par&quot; came from. In fact this is just comparing 3 APIs.<p>In fact the state of the art of OCR &quot;in the wild&quot; (using images off the street, for example) is far from 100%. Google Cloud Vision does pretty good.<p>The ICDAR 2017 challenges (especially Robust Reading Challenges) should give you an idea of where we are now:<p><a href="http:&#x2F;&#x2F;u-pat.org&#x2F;ICDAR2017&#x2F;program_competitions.php" rel="nofollow">http:&#x2F;&#x2F;u-pat.org&#x2F;ICDAR2017&#x2F;program_competitions.php</a>
评论 #15964570 未加载
评论 #15962457 未加载
notlisted超过 7 年前
The analysis is weak. &quot;There were 10 images where all three APIs got it wrong&quot;.<p>Guess what. Zoom in on the &quot;PRINCE&quot; image, and you&#x27;ll see it says top-right: A MIKE NEWELL FILM. So... both google and AWS did a nice job. It&#x27;s not reasonable to expect PRINCE as the outcome.<p>The point another person makes below about &quot;payloads&quot; and MSoft is valid too... As is the g-accented (not recognized because UTF codes not processed).<p>Makes ya wonder.
评论 #15965476 未加载
danso超过 7 年前
&gt; <i>The only drawback with AWS rekognition APIs is that it only takes an image stored as an AWS S3 object as input while the other API work with any image stored on the web.</i><p>This isn&#x27;t quite true -- the rekognition API will also accept base64 encoded bytes (5MB max): <a href="http:&#x2F;&#x2F;boto3.readthedocs.io&#x2F;en&#x2F;latest&#x2F;reference&#x2F;services&#x2F;rekognition.html#Rekognition.Client.detect_text" rel="nofollow">http:&#x2F;&#x2F;boto3.readthedocs.io&#x2F;en&#x2F;latest&#x2F;reference&#x2F;services&#x2F;rek...</a>
hcarvalhoalves超过 7 年前
It seems his examples include logos and such. Some of those services could be tuned for OCRing books and documents, which should be the bulk of OCR use cases in commercial applications. Since there&#x27;s usually a trade-off between flexibility vs. accuracy, I wouldn&#x27;t be surprised to see inverted results w&#x2F; a different dataset. Might be worth doing the test.
评论 #15962162 未加载
shoshin23超过 7 年前
Image text recognition is a major problem we&#x27;re trying to solve in our startup. I would love to be pointed to some SOTA research in this space. Hard to find anything by Googling about it.<p>As far as our experience goes, Cloud Vision API is a killer option compared to both AWS and MSFT. It&#x27;s pricier than AWS though and is slower. MSFT is terrible in both price and speed.
评论 #15964696 未加载
评论 #15963255 未加载
gajju3588超过 7 年前
I really hope some company is looking at this data and thinking can we do something disruptive here.
评论 #15962011 未加载
danso超过 7 年前
This article reminded me to check if any updates had been made to Google&#x27;s OCR since Cloud vision was in beta sometime last year or earlier this year [0]. It looks like a new parameter&#x2F;option for &quot;Document Text Detection&quot; -- i.e. something more akin to Tesseract, rather than just detecting words in images (such as road signs):<p><a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;vision&#x2F;docs&#x2F;detecting-fulltext" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;vision&#x2F;docs&#x2F;detecting-fulltext</a><p>[0] <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;dannguyen&#x2F;a0b69c84ebc00c54c94d" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;dannguyen&#x2F;a0b69c84ebc00c54c94d</a><p>edit: I would attempt my own test right now but it&#x27;s been awhile since I&#x27;ve tried to use Google Cloud. Right now I&#x27;m getting constant &quot;Server Error&quot; popups until Chrome decides to crash and die just when simply checking my account and billing page. The Cloud Console&#x27;s wonkiness is probably one of the reasons why I stopped using GC in favor of AWS :&#x2F;
评论 #15962800 未加载
gajju3588超过 7 年前
Comparision of face recognition apis from MS&#x2F;Amazon&#x2F;Kairos.<p>Kairos is doing better here. ;-)<p><a href="https:&#x2F;&#x2F;dataturks.com&#x2F;blog&#x2F;face-verification-api-comparison.php" rel="nofollow">https:&#x2F;&#x2F;dataturks.com&#x2F;blog&#x2F;face-verification-api-comparison....</a>
ape4超过 7 年前
A non-cloud API: <a href="https:&#x2F;&#x2F;docs.opencv.org&#x2F;3.1.0&#x2F;d4&#x2F;d61&#x2F;group__text.html" rel="nofollow">https:&#x2F;&#x2F;docs.opencv.org&#x2F;3.1.0&#x2F;d4&#x2F;d61&#x2F;group__text.html</a>
spunker540超过 7 年前
I’m surprised by these results being so poor because many of these reading tasks are so easy for human readers. And it seems so much more straightforward than stuff like face recognition and self driving cars!
评论 #15962112 未加载
评论 #15962099 未加载
BLanen超过 7 年前
During a college course my group also did a comparison like this but specifically for handwriting and we simulated low-quality scanned images.<p>Microsoft was by far the best at this. Google wasn&#x27;t even close.
garysieling超过 7 年前
I ran a bunch of stained glass images through several APIs - Google&#x27;s did by far the best, although there are a lot of issues (curved text, hand-written text in odd alignments or on a curve)
评论 #15962954 未加载
juanmirocks超过 7 年前
I would be curious to know if anyone in this community is indeed already using one of these APIs for a product and what your real-life experience is. Care to share your use cases?
correlation超过 7 年前
It would be interesting to see how Tesseract holds up against this.
评论 #15967519 未加载
评论 #15962142 未加载
red0point超过 7 年前
Is there any service that converts scanned PDF&#x27;s using the Google Cloud Vision API to searchable PDF&#x27;s? If so, that would be highly useful for me.
评论 #15966544 未加载
gricardo99超过 7 年前
Click on &quot;Get dataset and code&quot;:<p>&quot;We will email you the dataset and code.&quot;<p>What&#x27;s wrong with a github link in the article?
评论 #15972846 未加载
Omnipresent超过 7 年前
Does google cloud vision use deep learning to build their image text recognition?
xmly超过 7 年前
AWS only has 20+% correctness?