Putting Andrew Ng's OCR models to the test

124 pointsby ritvikpandey213 months ago

21 comments

That's the problem with the current deep learning models, they don't seem to know when they are wrong.There was so much hype about AlphaGo years ago, which seemed to be very good at reasoning about what's good and what's not, that I thought some form of "AI" is really going come relatively soon. The reality we have these days is that statistical models seem to be running without any constraints, making rules up as they go.I'm really thankful for the AI-assisted coding, code reviews and many other things that came from that, but the fact is, these really are just assistants that will make very bad mistakes and you need to watch them carefully.

评论 #43203387 未加载

noitanec3 months ago

I took the screenshot of the the bill in their article and ran through the tool at <a href="https://va.landing.ai/demo/doc-extraction" rel="nofollow">https://va.landing.ai/demo/doc-extraction</a>. The tool doesn't hallucinate any of the value as reported in the article. In fact, the value for Profit/loss for continuing operations is 1654 in their extraction which is the gt, still they've plot a red bbox around it.

评论 #43202517 未加载

cheema333 months ago

Am I the only one seeing a conflict of interest issue with this blog post?"We ran our OCR offering against competition. We find ours to be better. Sign up today."It feels like an ad masquerading as a news story.

评论 #43208103 未加载

评论 #43210013 未加载

评论 #43208372 未加载

ritvikpandey213 months ago

Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X:<a href="https://x.com/AndrewYNg/status/1895183929977843970" rel="nofollow">https://x.com/AndrewYNg/status/1895183929977843970</a>At Pulse, we put the models to the test with complex financial statements and nested tables – the results were underwhelming to say the least, and suffer from many of the same issues we see when simply dumping documents into GPT or Claude.

评论 #43202352 未加载

评论 #43201890 未加载

codelion3 months ago

I think there's a valid point about the production-readiness aspect. It's one thing to release a research paper, and another to market something as a service. The expectation levels are just different, and fair to scrutinize accordingly.

serjester3 months ago

Personally I find it frustrating they called it "agentic" parsing when there's nothing agentic about it. Not surprised the quality is lackluster.

评论 #43202527 未加载

评论 #43202510 未加载

rahimnathwani3 months ago

Has anyone compared this with the stuff Allen AI recently released?<a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=olmocr&sort=byPopularity&type=story" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...</a>

helloguillecl3 months ago

OCR, VLM or LLM for such important use cases seems like a a problem we should not have in 2025.The real solution would be to have machine readable data embedded in those PDFs, and have the table be built around that data.We could then we actual machine readable financial statements or reports, much like our passports.

评论 #43203277 未加载

评论 #43208142 未加载

jgalt2123 months ago

I can't believe there's market demand for non deterministic OCR, but what I really suspect is almost no one scans the same document twice and probably don't even realize this is a possibility.

sinuhe693 months ago

I still don’t understand why companies don’t release a machine-readable version of their finance statements. They are read by machines anyway! Export those data from their software is a simple task.

评论 #43203473 未加载

评论 #43207255 未加载

xrd3 months ago

Will we start to see a type of "SLA" from AI model providers? If I rent a server, I can pay for more 9s, but can I pay for a guarantee of accuracy from the models?

评论 #43205946 未加载

krashidov3 months ago

How does pulse compare to reducto and gemini? Claude is actually pretty good at PDFs (much better than GPT)

评论 #43202567 未加载

infecto3 months ago

I think a lot of OCR workflows are going the way of multimodal models but I still find that the cloud OCR tools to be vastly superior to most of these other startups in the space like the ad piece here from pulse.

bzmrgonz3 months ago

Why isn't there a pixel comparison step after the extraction? I think that would have identified some errors. Essentially, read, extract, recreate, pixel compare.

评论 #43208117 未加载

what3 months ago

> - Over 50% hallucinated values in complex financial tables> - Completely fabricated numbers in several instancesWhy are these different bullet points? Which one is correct number of wrong values?

评论 #43202558 未加载

veerdoshi3 months ago

Interested to see how OCR evals play a role in deciding the best model. Great read

j7ake3 months ago

Honestly he’s famous for pedagogy and research papers, not real world products.Not surprised it’s underwhelming

评论 #43202368 未加载

sreekanth8503 months ago

What has agents do with document parsing? Is it just extracting the text and use an LLM to analyze the extracted data?

_giorgio_3 months ago

<a href="https://x.com/svpino/status/1592140348905517056" rel="nofollow">https://x.com/svpino/status/1592140348905517056</a>""" In 2017, a team led by Andrew Ng published a paper showing off a Deep Learning model to detect pneumonia.[...]But there was a big problem with their results:[...]A random split would have sent images from the same patient to the train and validation sets.This creates a leaky validation strategy."""He's not infallible.

Ishirv3 months ago

good read, saw your recent raise in BI - congrats!

评论 #43202698 未加载

评论 #43202544 未加载

kneegerman3 months ago

>grifter grifts diggity