I'm sure the LLM-based engines will shine here, partly they are already here. A couple observations:
- Google Lens, now by default activated when you post an image to Google Images (<a href="https://images.google.com" rel="nofollow">https://images.google.com</a>) has a text recognition feature and it is very impressive even if you give it an image with a screen dpi and grammatically incoherent text (dictionary entries with short phrases and abbreviations)
- I played with different LLM-based chats with the following queries "Please reconstruct the original text from the following corrupted one: Smng rng wt ths ly". The test is similar to an OCR task when not all letters are recognizable or printed clearly. Perplexity for example answered with hesitation, but mostly correct (Something like: "I can not answer definitely, but related is "Something wrong with this reply")