TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Gogosseract, a Go Lib for CGo-Free Tesseract OCR via Wazero

120 pointsby dlock17over 1 year ago
Tesseract is one of the largest Open Source OCR (Optical Character Recognition) projects. There is already a Go library for using Tesseract from Go with CGo, called Gosseract.<p>However if you are interested in OCR from Go without C complicating building and cross-compiling, there aren&#x27;t any other options.<p>Wazero is a Go WASM runtime that doesn&#x27;t have any CGo dependencies. With Emscripten Tesseract has been compiled to WASM and ran within Wazero.<p>Gogosseract provides a simple API on top of this. This project has been an interesting delve into the world of WASM.

10 comments

yklcsover 1 year ago
I wrote a short blog post[1] on this method a while ago. I do think running WASM in embedded runtimes is a pretty good option, but overhead remains high, and WASI remains somewhat fragmented between compilers and runtimes.<p>I think this method really shines in Go as not having CGo simplifies a lot of things, and as a decently performant JITed runtime exists in the form of wazero.<p>[1]: <a href="https:&#x2F;&#x2F;yklcs.com&#x2F;blog&#x2F;universal-libs-with-wasm" rel="nofollow noreferrer">https:&#x2F;&#x2F;yklcs.com&#x2F;blog&#x2F;universal-libs-with-wasm</a>
iampimsover 1 year ago
To me, this is the real value of Wasm: platform independent libraries with a standard interface that doesn’t require C.
评论 #38148975 未加载
richieartoulover 1 year ago
This is awesome and one of the things I’m really excited about with WASM, and specifically Wazero. The Wazero team is top notch. Now someone just needs to do this with zstd and make it fast…
评论 #38148817 未加载
mappuover 1 year ago
Another really interesting way to approach this problem would be to adapt wasm2c to emit Go output. It should result in better performance than wazero.
评论 #38149124 未加载
donatjover 1 year ago
Oh awesome. I was really hoping a native OCR would pop up but this really is the next best thing and a more realistic avenue.
评论 #38149044 未加载
tommiegannertover 1 year ago
Thanks for sharing!<p>Since OCR is a somewhat slow process, how does the WASM approach compare to running libtesseract in a subprocess and use some IPC layer to talk to Go? It would require a separate C++ compiler, but not CGo.<p>&gt; one of the largest Open Source OCR<p>Tangential, but are there others as large as Tesseract? It seems to pop up anywhere I look.
评论 #38150011 未加载
评论 #38153619 未加载
abdullahkhalidsover 1 year ago
Is Tesseract currently the best open source OCR library? Best in terms of accuracy.<p>How much difference is there between Tesseract and the best proprietary solutions?
评论 #38150090 未加载
honkotimeover 1 year ago
It mentions that this is a rewrite of gosseract, however it is not a drop in replacement, so its more of a separate library in my opinion
评论 #38152950 未加载
technics256over 1 year ago
Off topic but in general how does something like this compare to cloud hosted ocr solutions?
评论 #38149948 未加载
breadchrisover 1 year ago
this is sick