Ask HN: What's the most performant/practical model/API for text extraction?

1 pointsby sandkoanover 2 years ago

AWS Textract might be more accurate, for instance, but is also not nearly as cost effective as spinning up an EC2 instance with Tesseract or easyOCR or PaddleOCR.<p>Or is it more sensible on an accuracy-vs-cost standpoint to just run a transformers model like TrOCR after identifying bounding boxes with textual data with something like CRAFT or EAST?

1 comment

sargstuffover 2 years ago

Depends on different factors & critera.<p>short example list:<p>* fixed font character text on blank background; human hand writing set against busy city street background<p>* converting non-text font image to text description. (collage of images forming illusion of text font)