TechEcho

I'm OCRing a bunch of TIFF files with tesseract, and while it works to some degree, it's nowhere near as accurate as I'd like it to be. Perhaps I'm doing something wrong and I could tune it to my liking, but I can't find too many resources on tesseract. Am I missing something?Any other recommendations for OCRs? Ideally it would be free, but I'm willing to pay if it's not too pricey.I've been trying out the trial version of FineReader, and it seems to work pretty well, so I may go with that.Any help is greatly appreciated.

I've had really great success with finereader. I tried out every free OCR tool I could find and after poor results went for finereader.Spend some time on their website so you get the right product, they have multiple prices for the same products, too. I got the latest Finereader (after a coupon code I found on google) for between 130-150.(I'm mostly scanning books)

Finereader is what Project Gutenberg has been using for the last decade or so.

what about gocr? its opensource see <a href="http://jocr.sourceforge.net/" rel="nofollow">http://jocr.sourceforge.net/</a>

One thing that improves Tesseract's performance dramatically is giving it grayscale tif images. Domogrify -type Grayscale *.tifand run them through tesseract to see the difference. No idea why no one mentions this in the documentation.

Finereader is what Project Gutenberg has been using for the last decade or so.

what about gocr? its opensource see <a href="http://jocr.sourceforge.net/" rel="nofollow">http://jocr.sourceforge.net/</a>

Ask HN: Recommended OCR?

5 comments

Ask HN: Recommended OCR?

5 comments