Hi HN - I released an open source OCR model yesterday that supports 93 world languages. It builds on a text line detector I created earlier.<p>In my benchmarks, it's more accurate than tesseract in every language except one. (see repo for benchmarking method)<p>Since it can run on GPU, speed is about equal to tesseract (when cost-matched with a 1x lambda A6000 vs 28 DigitalOcean CPU cores).<p>It's built using a modified donut architecture - I added an MoE layer, GQA for faster decoding, and UTF-16 decoding (can represent any character, and faster than UTF-8 since you can combine adjacent bytes.)<p>I theorized that character-level decoding would be an optimal compute allocation, and that a large embedding matrix (relative to UTF-8 decoding) would store language-specific information.<p>I trained it using 4x A6000s for about 2 weeks.<p>You can run surya via Python API, from the CLI, or via an interactive app in the repo.