TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Xerox scanners randomly alter numbers in scanned documents (2013)

187 pointsby gurjeetalmost 2 years ago

11 comments

lynx23almost 2 years ago
Even though this happened a long time ago, whenever I hear or think about it, I am a mazed that it didn&#x27;t put Xerox out of business, or at least hurt a little more. After all, some big players were already doing digital archiving at the time. :-&#x2F; BTW, the CCC had a pretty neat presentation at that time as well: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=c0O6UXrOZJo">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=c0O6UXrOZJo</a>
评论 #36307223 未加载
评论 #36306986 未加载
评论 #36305492 未加载
评论 #36305917 未加载
评论 #36309537 未加载
评论 #36307420 未加载
adhesive_wombatalmost 2 years ago
This can happen in some compression modes of DjVu as well at high compression factors, where the background and foreground is separated and the foreground (text, usually) is split into glyphs that can be shared by different instances. Mess up the recognition and the letters on the page appear literally different in the compressed &quot;oulput&quot;.
pronoiacalmost 2 years ago
Using OCRmyPDF, I applied lossless JBIG2 compression to a scanned book, after some consideration.<p>* the OCRmyPDF docs point to the JBIG2 Wikipedia page, and the Disadvantages section - <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;JBIG2#Disadvantages" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;JBIG2#Disadvantages</a> - so it&#x27;s easier to avoid this bug<p>* I&#x27;d hoped OCR would fall out of the process, but nope<p>* from the Wikipedia page, huh, the Pegasus malware exploited iOS&#x27;s implementation of JBIG2
j16sdizalmost 2 years ago
This is basically the same thing Samsung and friends doing to their camera app.<p>Expect, in Samsung&#x27;s case, some user love this.
评论 #36306216 未加载
评论 #36305996 未加载
评论 #36305721 未加载
merricksbalmost 2 years ago
Other recent related submissions&#x2F;discussions:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34815391">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34815391</a> - &quot;Xerox copier flaw changes numbers in scanned docs (2013)&quot; (theregister.com), 36 points, 3 months ago, 8 comments<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32537073">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32537073</a> - &quot;JBIG2 Undetectable Data Corruption: Destroying Our Past, One Character at a Time&quot; (circuitousroot.com), 67 points, 9 months ago, 34 comments
评论 #36305641 未加载
perpilalmost 2 years ago
One of the more amusing parts of this blog is that it contains at least two typos: &quot;arrors&quot; and &quot;ancoding&quot; and I can&#x27;t tell if they were on purpose.
mo_42almost 2 years ago
I remember this talk quite well. Also the other talks by David are interesting.<p>Somehow there are not really consequences on this. So either archiving stuff, at least in the business context, is not really important. Or we simply trust these copies. The latter one is of course scary.
JackFralmost 2 years ago
Xerox invented generative image ML for document scans, off a single training sample 10 years ago!
devyalmost 2 years ago
Is this a Xerox specific issue or industry wide problem? The author suspected that this is not an OCR issue What about out scanners? HP, Epson, Cannon, Ricoh, etc?
joebiden2almost 2 years ago
Please add (2913) to the title:<p><a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?query=Xerox%20scanners%20randomly%20alter%20numbers%20in%20scanned%20documents&amp;type=story&amp;dateRange=all&amp;sort=byDate&amp;storyText=false&amp;prefix&amp;page=0" rel="nofollow noreferrer">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?query=Xerox%20scanners%20randomly%20...</a><p>Edit: oops, 2013 - typed on my smartphone without reading back :)
评论 #36305125 未加载
评论 #36307906 未加载
评论 #36310014 未加载
评论 #36320448 未加载
genpfaultalmost 2 years ago
(2014) or (2015)