Image text recognition APIs showdown

193 点作者 mohi13超过 7 年前

22 条评论

jedberg超过 7 年前

I've said this many times: Google's AI services are superior in every way. They have better results and are easier to use.Amazon is the master of the "good enough". Their service works well enough that you can check the box that it exists and then point it at all that data you already have in AWS. And that's all that most everyone needs.If you are using AI and your competitors aren't, it doesn't really matter all that much how good the AI is -- you're gonna do better and be more efficient.It's only after everyone is using AI that it will start to matter how good your particular implementation is. Right now we're at the stage that any implantation is better than none.

评论 #15962530 未加载

评论 #15962622 未加载

评论 #15963071 未加载

评论 #15962406 未加载

yegle超过 7 年前

There's a picture which Google's Vision API returned Goo\u011fleI was wondering why there's a unicode char in the middle:<pre><code> In [5]: c = '\u011f' In [6]: c Out[6]: 'ğ' </code></pre> This does closely resemble the characters in the picture :-)

评论 #15962372 未加载

评论 #15962039 未加载

评论 #15961948 未加载

ckuhl超过 7 年前

I find it interesting that it looks like in at least one instance, Microsoft CS recognized the text upside down.<pre><code> Payloads -> speohed</code></pre>

评论 #15962440 未加载

评论 #15962222 未加载

eastendguy超过 7 年前

You can compare Google vs Microsoft vs OCR.space without signing up here:<a href="https://ocr.space/compare-ocr-software" rel="nofollow">https://ocr.space/compare-ocr-software</a>OCR.space is a "good enough" option for many projects. It has a very generous free tier of 25,000 free conversions/month per IP address (Google only 1000/month per account). In my tests it performed not as good as Google, but good enough for many applications (much better than Tesseract).

Isamu超过 7 年前

Don't know where the assessment "Msft, Google APIs accuracy still below par" came from. In fact this is just comparing 3 APIs.In fact the state of the art of OCR "in the wild" (using images off the street, for example) is far from 100%. Google Cloud Vision does pretty good.The ICDAR 2017 challenges (especially Robust Reading Challenges) should give you an idea of where we are now:<a href="http://u-pat.org/ICDAR2017/program_competitions.php" rel="nofollow">http://u-pat.org/ICDAR2017/program_competitions.php</a>

评论 #15964570 未加载

评论 #15962457 未加载

notlisted超过 7 年前

The analysis is weak. "There were 10 images where all three APIs got it wrong".Guess what. Zoom in on the "PRINCE" image, and you'll see it says top-right: A MIKE NEWELL FILM. So... both google and AWS did a nice job. It's not reasonable to expect PRINCE as the outcome.The point another person makes below about "payloads" and MSoft is valid too... As is the g-accented (not recognized because UTF codes not processed).Makes ya wonder.

评论 #15965476 未加载

danso超过 7 年前

> The only drawback with AWS rekognition APIs is that it only takes an image stored as an AWS S3 object as input while the other API work with any image stored on the web.This isn't quite true -- the rekognition API will also accept base64 encoded bytes (5MB max): <a href="http://boto3.readthedocs.io/en/latest/reference/services/rekognition.html#Rekognition.Client.detect_text" rel="nofollow">http://boto3.readthedocs.io/en/latest/reference/services/rek...</a>

hcarvalhoalves超过 7 年前

It seems his examples include logos and such. Some of those services could be tuned for OCRing books and documents, which should be the bulk of OCR use cases in commercial applications. Since there's usually a trade-off between flexibility vs. accuracy, I wouldn't be surprised to see inverted results w/ a different dataset. Might be worth doing the test.

评论 #15962162 未加载

shoshin23超过 7 年前

Image text recognition is a major problem we're trying to solve in our startup. I would love to be pointed to some SOTA research in this space. Hard to find anything by Googling about it.As far as our experience goes, Cloud Vision API is a killer option compared to both AWS and MSFT. It's pricier than AWS though and is slower. MSFT is terrible in both price and speed.

评论 #15964696 未加载

评论 #15963255 未加载

gajju3588超过 7 年前

I really hope some company is looking at this data and thinking can we do something disruptive here.

评论 #15962011 未加载

danso超过 7 年前

This article reminded me to check if any updates had been made to Google's OCR since Cloud vision was in beta sometime last year or earlier this year [0]. It looks like a new parameter/option for "Document Text Detection" -- i.e. something more akin to Tesseract, rather than just detecting words in images (such as road signs):<a href="https://cloud.google.com/vision/docs/detecting-fulltext" rel="nofollow">https://cloud.google.com/vision/docs/detecting-fulltext</a>[0] <a href="https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d" rel="nofollow">https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d</a>edit: I would attempt my own test right now but it's been awhile since I've tried to use Google Cloud. Right now I'm getting constant "Server Error" popups until Chrome decides to crash and die just when simply checking my account and billing page. The Cloud Console's wonkiness is probably one of the reasons why I stopped using GC in favor of AWS :/

评论 #15962800 未加载

gajju3588超过 7 年前

Comparision of face recognition apis from MS/Amazon/Kairos.Kairos is doing better here. ;-)<a href="https://dataturks.com/blog/face-verification-api-comparison.php" rel="nofollow">https://dataturks.com/blog/face-verification-api-comparison....</a>

ape4超过 7 年前

A non-cloud API: <a href="https://docs.opencv.org/3.1.0/d4/d61/group__text.html" rel="nofollow">https://docs.opencv.org/3.1.0/d4/d61/group__text.html</a>

spunker540超过 7 年前

I’m surprised by these results being so poor because many of these reading tasks are so easy for human readers. And it seems so much more straightforward than stuff like face recognition and self driving cars!

评论 #15962112 未加载

评论 #15962099 未加载

BLanen超过 7 年前

During a college course my group also did a comparison like this but specifically for handwriting and we simulated low-quality scanned images.Microsoft was by far the best at this. Google wasn't even close.

garysieling超过 7 年前

I ran a bunch of stained glass images through several APIs - Google's did by far the best, although there are a lot of issues (curved text, hand-written text in odd alignments or on a curve)

评论 #15962954 未加载

juanmirocks超过 7 年前

I would be curious to know if anyone in this community is indeed already using one of these APIs for a product and what your real-life experience is. Care to share your use cases?

correlation超过 7 年前

It would be interesting to see how Tesseract holds up against this.

评论 #15967519 未加载

评论 #15962142 未加载

red0point超过 7 年前

Is there any service that converts scanned PDF's using the Google Cloud Vision API to searchable PDF's? If so, that would be highly useful for me.

评论 #15966544 未加载

gricardo99超过 7 年前

Click on "Get dataset and code":"We will email you the dataset and code."What's wrong with a github link in the article?

评论 #15972846 未加载

Omnipresent超过 7 年前

Does google cloud vision use deep learning to build their image text recognition?

xmly超过 7 年前

AWS only has 20+% correctness?

22 条评论

jedberg超过 7 年前

评论 #15962530 未加载

评论 #15962622 未加载

评论 #15963071 未加载

评论 #15962406 未加载

yegle超过 7 年前

评论 #15962372 未加载

评论 #15962039 未加载

评论 #15961948 未加载

ckuhl超过 7 年前

I find it interesting that it looks like in at least one instance, Microsoft CS recognized the text upside down.<pre><code> Payloads -> speohed</code></pre>

评论 #15962440 未加载

评论 #15962222 未加载

eastendguy超过 7 年前

Isamu超过 7 年前

评论 #15964570 未加载

评论 #15962457 未加载

notlisted超过 7 年前

评论 #15965476 未加载

danso超过 7 年前

hcarvalhoalves超过 7 年前

评论 #15962162 未加载

shoshin23超过 7 年前

评论 #15964696 未加载

评论 #15963255 未加载

gajju3588超过 7 年前

I really hope some company is looking at this data and thinking can we do something disruptive here.

评论 #15962011 未加载

danso超过 7 年前

评论 #15962800 未加载

gajju3588超过 7 年前

ape4超过 7 年前

A non-cloud API: <a href="https://docs.opencv.org/3.1.0/d4/d61/group__text.html" rel="nofollow">https://docs.opencv.org/3.1.0/d4/d61/group__text.html</a>

spunker540超过 7 年前

评论 #15962112 未加载

评论 #15962099 未加载

BLanen超过 7 年前

garysieling超过 7 年前

I ran a bunch of stained glass images through several APIs - Google's did by far the best, although there are a lot of issues (curved text, hand-written text in odd alignments or on a curve)

评论 #15962954 未加载

juanmirocks超过 7 年前

I would be curious to know if anyone in this community is indeed already using one of these APIs for a product and what your real-life experience is. Care to share your use cases?

correlation超过 7 年前

It would be interesting to see how Tesseract holds up against this.

评论 #15967519 未加载

评论 #15962142 未加载

red0point超过 7 年前

Is there any service that converts scanned PDF's using the Google Cloud Vision API to searchable PDF's? If so, that would be highly useful for me.

评论 #15966544 未加载

gricardo99超过 7 年前

Click on "Get dataset and code":"We will email you the dataset and code."What's wrong with a github link in the article?

评论 #15972846 未加载

Omnipresent超过 7 年前

Does google cloud vision use deep learning to build their image text recognition?

xmly超过 7 年前

AWS only has 20+% correctness?