I have a bunch of newspaper clippings and pages from old articles about my grandfather, all in Swedish.<p>I would like to digitize all articles and translate them to English in a semi-automated way. I know little Swedish so I can't translate them myself, plus there's over 100 article clippings.<p>Has anyone ever been through this process or something similar? I would appreciate any tips on what software to use.
Theres a series of Optical Character Recognition repos that should help you with task #1. They are all based around Google's Tesseract. If I remember correctly this is one of the top=> <a href="https://github.com/danielquinn/paperless" rel="nofollow">https://github.com/danielquinn/paperless</a>
I've used project naptha in the past... and little known fact that google docs can do the OCR automatically too.<p>regarding the translation... never had to do it. sorry!