I'm working on an open source tax filing web app at https://ustaxes.org/ and https://github.com/thegrims/UsTaxes<p>Any ideas on best practices for extracting tax data from a W-2 form? I've looked at Microsoft form-recognizer and AWS Textract, but I haven't been able to get good results so far. (caveat I haven't tried either with custom training data)
Is it still the case that W-2's are usually only provided in paper form ? If they would just e-mail a (non-scanned) PDF you could extract the data easily without having to deal with OCR.