TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Microsoft OCR Library for Windows Runtime

132 pointsby maouidaover 10 years ago

9 comments

steeveover 10 years ago
We had great results using tesseract-ocr[1] with SWT (state of the art text detection algorithm, via libccv[2]) on Linux.<p>You can use our python bindings for both[3,4], although they might be slightly outdated:<p>[1] <a href="https://code.google.com/p/tesseract-ocr/" rel="nofollow">https:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;tesseract-ocr&#x2F;</a><p>[2] <a href="http://libccv.org/doc/doc-swt/" rel="nofollow">http:&#x2F;&#x2F;libccv.org&#x2F;doc&#x2F;doc-swt&#x2F;</a><p>[3] <a href="https://github.com/veezio/pytesseract" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;veezio&#x2F;pytesseract</a><p>[4] <a href="https://github.com/veezio/pyccv" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;veezio&#x2F;pyccv</a>
评论 #8339738 未加载
评论 #8339666 未加载
评论 #8339932 未加载
swalshover 10 years ago
This is very cool! I&#x27;ve been working on a receipt scanning tool in C# for keeping track of kitchen inventory (tired of calling my wife asking if we have sesame oil or some odd ball thing)<p>I found a few libraries, but they only worked with relatively perfect scans (my goal is to be able to just use a phone). When I get home definitely going to give this a go.
jamessantiagoover 10 years ago
Off topic, but this made me think that it would be neat if libraries on places like github and nuget could someout include &quot;cited by&quot; data. Something that referenced open source (maybe closed source too) projects that had a dependency to the library similar to google scholar or CiteSeerX.
评论 #8339494 未加载
rikkusover 10 years ago
It doesn&#x27;t appear that you can use this in a &#x27;normal&#x27; .NET app. Any ideas why?
评论 #8339713 未加载
评论 #8339623 未加载
评论 #8339632 未加载
mdanielover 10 years ago
On <a href="http://msdn.microsoft.com/en-us/library/windows/apps/windowspreview.media.ocr.aspx" rel="nofollow">http:&#x2F;&#x2F;msdn.microsoft.com&#x2F;en-us&#x2F;library&#x2F;windows&#x2F;apps&#x2F;windows...</a> they mention the supported languages and their statuses, but Korean is only &quot;Good&quot;.<p>I freely admit that I do not speak Korean, but if one compares &quot;Chinese Simplified&quot; characters (listed as &quot;Very good&quot;) with those in the Korean alphabet, I am surprised those two entries aren&#x27;t transposed.<p>Is there something that makes recognizing Korean harder than Chinese Simplified, or was that just a product management decision?
cipher0over 10 years ago
&quot;demonstrated in code snippets below&quot;. The code snippets are actually images and even worse, they&#x27;re JPEGs which is the reason why the text looks horrible.
评论 #8339672 未加载
评论 #8339811 未加载
评论 #8339665 未加载
评论 #8339723 未加载
reallycuriousover 10 years ago
is this better than the terrassect OCR?
评论 #8339556 未加载
评论 #8339915 未加载
评论 #8339661 未加载
jccodezover 10 years ago
tesseract is really looking great with google adding searchable pdf as output in the latest release candidate.
Norm--over 10 years ago
So from reading the list of reasons for inaccurate results, it sounds like this library is totally useless for images taken with mobile phones, yet it is only allowed to run on mobile ;)<p>Now I would be more interested in an image correction library<p>&quot;.... Blurry images Handwritten or cursive text Artistic font styles Small text size (less than 15 pixels for Western languages, or less than 20 pixels for East Asian languages) Complex backgrounds Shadows or glare over text Perspective distortion Oversized or dropped capital letters at the beginnings of words Subscript, superscript, or strikethrough text&quot;