科技回声

11 条评论

I tested it out with a bunch of personal documents. Results were disappointing. Did not match up with the promised scores, not even slightly.I think the traditional approach to scanning and classifying without AI/ML is the way to go, for the next 5 years at very least.

评论 #36112601 未加载

评论 #36114787 未加载

评论 #36113683 未加载

评论 #36112605 未加载

评论 #36113470 未加载

评论 #36114523 未加载

评论 #36112735 未加载

评论 #36115421 未加载

dkatz23238将近 2 年前

As a developer who has been building IDP solutions I can assert that although this model is a lot larger (more weights) than a Graph Neural Network on OCR tokens, industry standard before transformers, it outperforms given enough data. Depending on how heterogenous the data is usually 200 documents can reach human levels of accuracy on documents, scoring by levenshtein ratio.Smaller graph models could get away with using less data. The problem that the "traditional" approach had is the the quality of the OCR was the bottleneck for overall model performance. It amazes me how this problem shifted from a node classification problem to a image to text problem.Training on CPU was possible with GCN but not with Donut.

dowakin将近 2 年前

If you want to train the Donut, check out this notebook on Kaggle. It trains Donut to read plots for a competition. The notebook contains full pipeline for finetuning. <a href="https://www.kaggle.com/code/nbroad/donut-train-benetech" rel="nofollow">https://www.kaggle.com/code/nbroad/donut-train-benetech</a>

armchairhacker将近 2 年前

These OCR tools are bringing us closer to msPaint as a viable IDE

tkanarsky将近 2 年前

> Donut: DOcumeNt Understanding TransformerAuthor: phew! I'm glad there's an 'n' in there somewhere

评论 #36112313 未加载

评论 #36118298 未加载

xavriley将近 2 年前

There’s a model for music transcription (audio to midi) called MT3 which takes an end-to-end transformer approach and claims SOTA on some datasets. However, from my own research and comparing with other models it seems that MT3 is very prone to overfitting and the real world results are not as impressive. A similar story seems to be playing out in the comments here

评论 #36113924 未加载

vosper将近 2 年前

I want to build an application that scans restaurant and café menus (PDFs, photos, webpages) to identify which items are vegetarian or vegan. Would this work for that? If not, I would love to hear peoples ideas and suggestions.

评论 #36112326 未加载

评论 #36113325 未加载

评论 #36112309 未加载

评论 #36113212 未加载

nestorD将近 2 年前

I will have to investigate this, I am dreaming of a system that can take a pdf scan of a book as input and produce one or more properly formated (headings, italic, bold, underline, etc) markdown files. In my tests, LLMs have proved very good at cleaning a raw OCR but they need formating information to get me all the way.

评论 #36116520 未加载

ryanjshaw将近 2 年前

This is really cool if it delivers. I tried building an app to scan till receipts. The image to text APIs out there really don't perform as well as you'd think. AWS Text Extract performed far better than GCP and Azure equivalents and traditional OCR solutions, but it still made some really annoying errors that I had to fix with heuristics.

评论 #36113111 未加载

评论 #36112554 未加载

评论 #36113038 未加载

aosmith将近 2 年前

So is this why IA had an outage? Timing is perfect.

i2cmaster将近 2 年前

I've started using Microsoft's TROCR (another transformer OCR model) to read the cursive in my pocket journal (I have a habit of writing programs there first while I'm out and then typing them in manually, I just focus better that way.)It's surprisingly accurate although you have to write your own program to segment the image into lines. I think with some fine tuning I could have the machine read my notebook with minimal corrections.

评论 #36116412 未加载

11 条评论

AmazingTurtle将近 2 年前

评论 #36112601 未加载

评论 #36114787 未加载

评论 #36113683 未加载

评论 #36112605 未加载

评论 #36113470 未加载

评论 #36114523 未加载

评论 #36112735 未加载

评论 #36115421 未加载

dkatz23238将近 2 年前

dowakin将近 2 年前

armchairhacker将近 2 年前

These OCR tools are bringing us closer to msPaint as a viable IDE

tkanarsky将近 2 年前

> Donut: DOcumeNt Understanding TransformerAuthor: phew! I'm glad there's an 'n' in there somewhere

Donut: OCR-Free Document Understanding Transformer

11 条评论

Donut: OCR-Free Document Understanding Transformer

11 条评论