We had great results using tesseract-ocr[1] with SWT (state of the art text detection algorithm, via libccv[2]) on Linux.<p>You can use our python bindings for both[3,4], although they might be slightly outdated:<p>[1] <a href="https://code.google.com/p/tesseract-ocr/" rel="nofollow">https://code.google.com/p/tesseract-ocr/</a><p>[2] <a href="http://libccv.org/doc/doc-swt/" rel="nofollow">http://libccv.org/doc/doc-swt/</a><p>[3] <a href="https://github.com/veezio/pytesseract" rel="nofollow">https://github.com/veezio/pytesseract</a><p>[4] <a href="https://github.com/veezio/pyccv" rel="nofollow">https://github.com/veezio/pyccv</a>
This is very cool! I've been working on a receipt scanning tool in C# for keeping track of kitchen inventory (tired of calling my wife asking if we have sesame oil or some odd ball thing)<p>I found a few libraries, but they only worked with relatively perfect scans (my goal is to be able to just use a phone). When I get home definitely going to give this a go.
Off topic, but this made me think that it would be neat if libraries on places like github and nuget could someout include "cited by" data. Something that referenced open source (maybe closed source too) projects that had a dependency to the library similar to google scholar or CiteSeerX.
On <a href="http://msdn.microsoft.com/en-us/library/windows/apps/windowspreview.media.ocr.aspx" rel="nofollow">http://msdn.microsoft.com/en-us/library/windows/apps/windows...</a> they mention the supported languages and their statuses, but Korean is only "Good".<p>I freely admit that I do not speak Korean, but if one compares "Chinese Simplified" characters (listed as "Very good") with those in the Korean alphabet, I am surprised those two entries aren't transposed.<p>Is there something that makes recognizing Korean harder than Chinese Simplified, or was that just a product management decision?
"demonstrated in code snippets below". The code snippets are actually images and even worse, they're JPEGs which is the reason why the text looks horrible.
So from reading the list of reasons for inaccurate results, it sounds like this library is totally useless for images taken with mobile phones, yet it is only allowed to run on mobile ;)<p>Now I would be more interested in an image correction library<p>"....
Blurry images
Handwritten or cursive text
Artistic font styles
Small text size (less than 15 pixels for Western languages, or less than 20 pixels for East Asian languages)
Complex backgrounds
Shadows or glare over text
Perspective distortion
Oversized or dropped capital letters at the beginnings of words
Subscript, superscript, or strikethrough text"