Has anyone tried experimenting with using LLM to enhance the results OCR. OCR software may produce results that are full of noises (nonsence chars). It's very hard to pattern matching the generated results since the noises are high unpredictable. Does LLM help to "de-noise" the results since they tend to take in char level information and might recognize what are useless information?
I'm sure the LLM-based engines will shine here, partly they are already here. A couple observations:
- Google Lens, now by default activated when you post an image to Google Images (<a href="https://images.google.com" rel="nofollow">https://images.google.com</a>) has a text recognition feature and it is very impressive even if you give it an image with a screen dpi and grammatically incoherent text (dictionary entries with short phrases and abbreviations)
- I played with different LLM-based chats with the following queries "Please reconstruct the original text from the following corrupted one: Smng rng wt ths ly". The test is similar to an OCR task when not all letters are recognizable or printed clearly. Perplexity for example answered with hesitation, but mostly correct (Something like: "I can not answer definitely, but related is "Something wrong with this reply")
I have been wondering the same thing. So many OCR engines spit out results that are obviously wrong, and I don't want them to get too clever but a little but of smarts would go a long way.