TechEcho

9 comments

steeveover 10 years ago

We had great results using tesseract-ocr[1] with SWT (state of the art text detection algorithm, via libccv[2]) on Linux.You can use our python bindings for both[3,4], although they might be slightly outdated:[1] <a href="https://code.google.com/p/tesseract-ocr/" rel="nofollow">https://code.google.com/p/tesseract-ocr/</a>[2] <a href="http://libccv.org/doc/doc-swt/" rel="nofollow">http://libccv.org/doc/doc-swt/</a>[3] <a href="https://github.com/veezio/pytesseract" rel="nofollow">https://github.com/veezio/pytesseract</a>[4] <a href="https://github.com/veezio/pyccv" rel="nofollow">https://github.com/veezio/pyccv</a>

评论 #8339738 未加载

评论 #8339666 未加载

评论 #8339932 未加载

swalshover 10 years ago

This is very cool! I've been working on a receipt scanning tool in C# for keeping track of kitchen inventory (tired of calling my wife asking if we have sesame oil or some odd ball thing)I found a few libraries, but they only worked with relatively perfect scans (my goal is to be able to just use a phone). When I get home definitely going to give this a go.

jamessantiagoover 10 years ago

Off topic, but this made me think that it would be neat if libraries on places like github and nuget could someout include "cited by" data. Something that referenced open source (maybe closed source too) projects that had a dependency to the library similar to google scholar or CiteSeerX.

评论 #8339494 未加载

rikkusover 10 years ago

It doesn't appear that you can use this in a 'normal' .NET app. Any ideas why?

评论 #8339713 未加载

评论 #8339623 未加载

评论 #8339632 未加载

mdanielover 10 years ago

On <a href="http://msdn.microsoft.com/en-us/library/windows/apps/windowspreview.media.ocr.aspx" rel="nofollow">http://msdn.microsoft.com/en-us/library/windows/apps/windows...</a> they mention the supported languages and their statuses, but Korean is only "Good".I freely admit that I do not speak Korean, but if one compares "Chinese Simplified" characters (listed as "Very good") with those in the Korean alphabet, I am surprised those two entries aren't transposed.Is there something that makes recognizing Korean harder than Chinese Simplified, or was that just a product management decision?

cipher0over 10 years ago

"demonstrated in code snippets below". The code snippets are actually images and even worse, they're JPEGs which is the reason why the text looks horrible.

评论 #8339672 未加载

评论 #8339811 未加载

评论 #8339665 未加载

评论 #8339723 未加载

reallycuriousover 10 years ago

is this better than the terrassect OCR?

评论 #8339556 未加载

评论 #8339915 未加载

评论 #8339661 未加载

jccodezover 10 years ago

tesseract is really looking great with google adding searchable pdf as output in the latest release candidate.

Norm--over 10 years ago

So from reading the list of reasons for inaccurate results, it sounds like this library is totally useless for images taken with mobile phones, yet it is only allowed to run on mobile ;)Now I would be more interested in an image correction library".... Blurry images Handwritten or cursive text Artistic font styles Small text size (less than 15 pixels for Western languages, or less than 20 pixels for East Asian languages) Complex backgrounds Shadows or glare over text Perspective distortion Oversized or dropped capital letters at the beginnings of words Subscript, superscript, or strikethrough text"

9 comments

steeveover 10 years ago

评论 #8339738 未加载

评论 #8339666 未加载

评论 #8339932 未加载

swalshover 10 years ago

jamessantiagoover 10 years ago

评论 #8339494 未加载

rikkusover 10 years ago

It doesn't appear that you can use this in a 'normal' .NET app. Any ideas why?

评论 #8339713 未加载

评论 #8339623 未加载

评论 #8339632 未加载

mdanielover 10 years ago

cipher0over 10 years ago

"demonstrated in code snippets below". The code snippets are actually images and even worse, they're JPEGs which is the reason why the text looks horrible.

评论 #8339672 未加载

评论 #8339811 未加载

评论 #8339665 未加载

评论 #8339723 未加载

reallycuriousover 10 years ago

is this better than the terrassect OCR?

评论 #8339556 未加载

评论 #8339915 未加载

评论 #8339661 未加载

jccodezover 10 years ago

tesseract is really looking great with google adding searchable pdf as output in the latest release candidate.

Norm--over 10 years ago

Microsoft OCR Library for Windows Runtime

9 comments

Microsoft OCR Library for Windows Runtime

9 comments