That's the problem with the current deep learning models, they don't seem to know when they are wrong.<p>There was so much hype about AlphaGo years ago, which seemed to be very good at reasoning about what's good and what's not, that I thought some form of "AI" is really going come relatively soon. The reality we have these days is that statistical models seem to be running without any constraints, making rules up as they go.<p>I'm really thankful for the AI-assisted coding, code reviews and many other things that came from that, but the fact is, these really are just assistants that will make very bad mistakes and you need to watch them carefully.
I took the screenshot of the the bill in their article and ran through the tool at <a href="https://va.landing.ai/demo/doc-extraction" rel="nofollow">https://va.landing.ai/demo/doc-extraction</a>. The tool doesn't hallucinate any of the value as reported in the article. In fact, the value for Profit/loss for continuing operations is 1654 in their extraction which is the gt, still they've plot a red bbox around it.
Am I the only one seeing a conflict of interest issue with this blog post?<p>"We ran our OCR offering against competition. We find ours to be better. Sign up today."<p>It feels like an ad masquerading as a news story.
Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X:<p><a href="https://x.com/AndrewYNg/status/1895183929977843970" rel="nofollow">https://x.com/AndrewYNg/status/1895183929977843970</a><p>At Pulse, we put the models to the test with complex financial statements and nested tables – the results were underwhelming to say the least, and suffer from many of the same issues we see when simply dumping documents into GPT or Claude.
I think there's a valid point about the production-readiness aspect. It's one thing to release a research paper, and another to market something as a service. The expectation levels are just different, and fair to scrutinize accordingly.
Personally I find it frustrating they called it "agentic" parsing when there's nothing agentic about it. Not surprised the quality is lackluster.
Has anyone compared this with the stuff Allen AI recently released?<p><a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=olmocr&sort=byPopularity&type=story" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...</a>
OCR, VLM or LLM for such important use cases seems like a a problem we should not have in 2025.<p>The real solution would be to have machine readable data embedded in those PDFs, and have the table be built around that data.<p>We could then we actual machine readable financial statements or reports, much like our passports.
I can't believe there's market demand for non deterministic OCR, but what I really suspect is almost no one scans the same document twice and probably don't even realize this is a possibility.
I still don’t understand why companies don’t release a machine-readable version of their finance statements. They are read by machines anyway! Export those data from their software is a simple task.
Will we start to see a type of "SLA" from AI model providers? If I rent a server, I can pay for more 9s, but can I pay for a guarantee of accuracy from the models?
I think a lot of OCR workflows are going the way of multimodal models but I still find that the cloud OCR tools to be vastly superior to most of these other startups in the space like the ad piece here from pulse.
Why isn't there a pixel comparison step after the extraction? I think that would have identified some errors. Essentially, read, extract, recreate, pixel compare.
> - Over 50% hallucinated values in complex financial tables<p>> - Completely fabricated numbers in several instances<p>Why are these different bullet points? Which one is correct number of wrong values?
<a href="https://x.com/svpino/status/1592140348905517056" rel="nofollow">https://x.com/svpino/status/1592140348905517056</a><p>"""
In 2017, a team led by Andrew Ng published a paper showing off a Deep Learning model to detect pneumonia.<p>[...]<p>But there was a big problem with their results:<p>[...]<p>A random split would have sent images from the same patient to the train and validation sets.<p>This creates a leaky validation strategy.<p>"""<p>He's not infallible.