With all the progress in machine learning it seems like there should be an amazing OCR tool that works out of the box for structured documents.<p>Does it exist?<p>I've used tesseract and its relatives, but they seem to have a hard time with any document that's not a single column. The difference between what they can achieve (which to be fair is amazing) with what I <i>expected</i> based on all the ML demos that only does the first 10% (numbers only, no structure), but does it in a 30-minute demo, is big. Things like affine transforms (scaling, rotation) and decorations like bold, underline, and weird fonts create even more problems.<p>Why isn't there a docker container with an AWS lambda function in it that takes any image format I upload (pdf, png, jpg as the most critical) and returns a UTF string of its content?<p>My god, I'm spoiled by technology.
(Apologies in advance if this sounds snarky, don't mean to be)<p>There are hundreds of posts on using Keras/TensorFlow/PyTorch/etc to do MNIST classification, and many examples on Github. All these resources are very easy to find by Googling. This post doesn't seem to have anything different to add to the conversation. So, why do such articles continue to be written, and why do they still get upvoted on HN? Is it that a lot of people want to learn these things, but haven't gotten a chance to, so they upvote with the hope of staying in touch with the topic? Is it FOMO? One might be forgiven for considering this spam.
“Many good ideas will not work well on MNIST (e.g. batch norm). Inversely[,] many bad ideas may work on MNIST and no[t] transfer to real [computer vision]” – a tweet by François Chollet (creator of Keras)<p>So - please, anything harder (or rather: more relevant to deep learning). At least images in CIFAR-10:
<a href="https://blog.deepsense.ai/deep-learning-hands-on-image-classification/" rel="nofollow">https://blog.deepsense.ai/deep-learning-hands-on-image-class...</a>