Xerox responds to the recent character substitution issue

61 pointsby soulclapalmost 12 years ago

18 comments

soulclapalmost 12 years ago

Previous discussion: <a href="https://news.ycombinator.com/item?id=6156238" rel="nofollow">https://news.ycombinator.com/item?id=6156238</a>Follow-up blog post about a conference call with Xerox: <a href="http://www.dkriesel.com/en/blog/2013/0806_conference_call_with_xerox" rel="nofollow">http://www.dkriesel.com/en/blog/2013/0806_conference_call_wi...</a>

评论 #6168148 未加载

eksithalmost 12 years ago

"We do not normally see a character substitution issue with the factory default settings..."It shouldn't be seen with any setting. Nothing you can do to the device (short of involving a hammer) should change the content in any way. Compress, resize, zoom, do whatever, but it simply must not change the content at any time at any resolution/quality.I'm just flabbergasted that such a compression scheme was ever implemented in the first place. Surely, there are alternative OCR based methods do compression that don't introduce these artifacts (that's putting it mildly) at lower resolutions.

评论 #6168522 未加载

binarymaxalmost 12 years ago

This is an absolutely hilarious technical response to a real world customer issue. The customer does not care one iota that their photocopier uses one compression algorithm over another. And the fact there is not one mention of the word 'copy' in that entire post, is very telling of the technical disconnect exhibited here. The 'Xerox devices' in question are completely broken from a usability perspective.

评论 #6168719 未加载

wtallisalmost 12 years ago

So they claim that the fine print warns about character substitution. But they still are willing to label the option with that problem "normal quality" and suggest using "high quality" to get strictly image compression applied with no OCR. They don't seem to understand that a photocopier should in its normal operating mode never do post-processing that creates such surprising and misleading artifacts - better illegible and obviously so than legible but incorrect.Don't get me wrong - using OCR is a great compression technique, but if it isn't reliable enough, it shouldn't be the default or "normal" setting.

评论 #6168216 未加载

评论 #6168239 未加载

评论 #6168324 未加载

ChuckMcMalmost 12 years ago

That is an astonishing response. Reminds me a bit of the first time EMC pointed out that while it was possible to have your data corrupted in their hash based storage system, it probably would never happen.I was expecting "Here is new firmware and we apologize for using JBIG2, won't happen again."One wonders if JBIG2 is used in the storing of checks by banks (my bank these days only sends me images of my checks, never the actual check any more) or DMV records, or any number of things.So in the previous thread I suggested a JBIG2 test image, now I want to build one that if you copy it, it goes from one thing to something else entirely!

205guyalmost 12 years ago

This is an interesting story with lots of odd comments.First of foremost, I agree that Xerox putting their name on a product which creates an unfaithful copy is corporate suicide. Such an ancient paragon of computer innovation should be able to come up with a clever algorithm that compresses but doesn't substitute image bits.But...- The original story[1] didn't mention that the product itself warns against the very thing they are reporting. Did they ignore that warning, did the copier not show it, did they use a setting that did not have the warning? Their further posts cover the issue, so it looks like somebody else set the resolution and ignored the warning.- Calling what the JBIG2 algorithm does "OCR" is misleading. OCR is pretty much understood to be analog text (image) to digital text (ASCII, UTF-32). Matching to a real character set and outputting those characters is a defining part of true OCR. It's also confusing because the copiers have a true OCR function, and this is not related. What JBIG2 does, I would call it "sub-image matching and substitution."- Calling JBIG2 "lossy" is also misleading. I suppose it is lossy by definition, but lossy is usually limited to pixel effects as seen in JPG, no image blocks.- JBIG2 seems like an algorithm that shouldn't be used on low-res text documents. You might say it's just a configuration of the algorithm, but if engineers can't take it as a tool and use it correctly, you start to wonder if it's a problem with the tool.[1] <a href="http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning" rel="nofollow">http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...</a>?

nsxwolfalmost 12 years ago

When you read a scanned or copied document, your confidence in the information is based on its quality.There comes a point when the quality is so poor that you no longer trust your interpretation. Is that a 3? An 8? If you can't tell, you will not act on that information without further clarification.This compression algorithm destroys this process.How can you trust what you are reading anymore? How do we know there isn't a bug that sometimes causes the content substitution when the source text is large and perfectly legible?Disk space is not at enough of a premium to justify this.

morschalmost 12 years ago

I was curious to see how JBIG2 fares compared to JPEG, and found a benchmark from 2010 [1] comparing the file size of the resulting PDF:<pre><code> convert *.jpg JPEG.pdf -- 43777 kb convert *.png PNG.pdf -- 6907 kb jbig2 -b J -d -p -s *.jpg; pdf.py J > JBIG2.pdf -- 947 kb jbig2 -b J -d -p -s -2 *.jpg; pdf.py J > 2xJBIG2.pdf -- 1451 kb </code></pre> Quite a difference. I don't quite understand how JPEG fares so poorly compared to (lossless) PNG, maybe because it doesn't do monochrome?[1] <a href="http://ssdigit.nothingisreal.com/2010/03/pdfs-jpeg-vs-png-vs-jbig.html" rel="nofollow">http://ssdigit.nothingisreal.com/2010/03/pdfs-jpeg-vs-png-vs...</a>

评论 #6170243 未加载

eyearequealmost 12 years ago

Sounds like the "recognized industry standard JBIG2 compressor" is just about useless for copy machines. Why even give a user the ability to do this?The only acceptable fix for this is to disable the ability to use lower compression qualities that have could EVER cause this to happen.

评论 #6168565 未加载

Cyclosaalmost 12 years ago

Xerox is oh-so subtly shifting the blame on to the user. How slimy.

speederalmost 12 years ago

I am quite disappointed by their response.I expected something better from Xerox, instead it is a sort of: "You are a stupid costumer, leave it on default and stop bothering me, it is not my fault you find bugs when not using the default."

mikeashalmost 12 years ago

Standard idiot-box weasel-wording. Another case study to put on the enormous pile of examples of how not to communicate with your customers.Pretend you care, blame the users, and don't take any action. Hey, what could be wrong with that?

评论 #6170328 未加载

mark-ralmost 12 years ago

These are multi function devices meant to be used by many people. If someone in your office has need to make occasional scans that need to fit in an email, isn't it natural to assume they might configure the machine for maximum compression? Why should that setting affect copies?

mathattackalmost 12 years ago

This may be an issue of giving people too much choice. Should users have the freedom to make terrible mistakes? Maybe in Linux, but not Windows. Similarly, you don't want an inexperienced secretary to get your company in legal trouble. Blaming the users can kill Xerox.

emmelaichalmost 12 years ago

Off topic but it was awesome to see a link with "perl-bin" in it. I nice insight into what really does the important work in these big shiny corporations. :-)

ARothfuszalmost 12 years ago

Is this a real response? It is bylined as "Guest Blogger" and is not in an official-looking blog.

preinheimeralmost 12 years ago

Xerox: You had one job.

workbenchalmost 12 years ago

"the device web user interface"Why on earth does a scanner have a web interface