TechEcho

10 comments

IvanK_netabout 8 years ago

It reminds me a fun fact, which I noticed when writing a TTF font parser <a href="https://github.com/photopea/Typr.js" rel="nofollow">https://github.com/photopea/Typr.js</a>TTF files have a 4-byte field, where the font manufacturer can put "information about himself" (like the identification). The Adobe company puts an ASCII string "ADBE" into these four bytes.There is another field for the font manufacturer, which has only two bytes. Guess what Adobe puts into these two bytes? 0xadbe :D

评论 #14096495 未加载

评论 #14091186 未加载

cornstalksabout 8 years ago

Am I the only one struggling to understand the hex dump? The author says "0x79 is the z in the ASCII table." That's wrong. 'z' is 0x7a.The author also says "In UTF-8 all characters after 0x79 are at least two bytes long." That's also wrong. All characters after 0x7f get encoded as two or more bytes.

评论 #14090722 未加载

评论 #14090721 未加载

评论 #14093247 未加载

koehrabout 8 years ago

So what most probably happened is that FileReader.readAsBinaryString()° defaults to FileReader.readAsText()°° since it's deprecated. At least that is what I saw in Chromium. As soon as I used readAsArrayBuffer the problem went away.°) <a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsBinaryString" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/FileReader/...</a>°°) <a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsText" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/FileReader/...</a>

derpadeltabout 8 years ago

If you dive head-first into Python's string behaviour, you'll eventually learn the hard UnicodeDecodeError-way what the difference is between a stream of bytes/octets and a text made of unicode code points. Much the same as learning that a timestamp without a timezone is not worth much, a text as a stream of bytes is not much worth without the encoding it is in. PHP also has nice footguns in that area.

评论 #14092307 未加载

评论 #14090496 未加载

评论 #14093016 未加载

评论 #14095094 未加载

gnrlistabout 8 years ago

Sounds like you're confusing the integer code point and the integer representation of characters.Many programming languages internally represent chars as UTF-8 or UTF-16, so when using libraries to read bytes into chars everything get's mangled.Check out this guide for more in-depth look at the mangling that can happen. <a href="http://cweb.github.io/unicode-security-guide/background/" rel="nofollow">http://cweb.github.io/unicode-security-guide/background/</a>

tossandturnabout 8 years ago

Isn't this why FTP had separate Binary and Text transfer modes?

评论 #14090381 未加载

MatthewWilkesabout 8 years ago

I had a similar thing with glitched images once. We had to retrofit a middleware onto a site that would obfuscate email addresses. It used a regex to spot valid emails and replaced them with a hash. It also knew what urls and form parameters expected emails and used a lookup table to translate them back. This was sufficient to anonymise usernames without breaking any functionality on the site.Turns out, we forgot to check content type, and valid emails according to the regex we had used were surprisingly common in binaries.

chuckdriesabout 8 years ago

So the server was erroneously treating the images as text?

评论 #14090439 未加载

mnarayan01about 8 years ago

First guess: The author is running into something related to <a href="https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Sending_and_Receiving_Binary_Data#Receiving_binary_data_in_older_browsers" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequ...</a>.

yongjikabout 8 years ago

> 0x79 is the z in the ASCII table.The most confusing technically correct statement of the year.Edit: Sorry, scratch "technically correct". Need more coffee.

评论 #14090798 未加载

评论 #14090570 未加载

评论 #14090571 未加载

10 comments

IvanK_netabout 8 years ago

评论 #14096495 未加载

评论 #14091186 未加载

cornstalksabout 8 years ago

评论 #14090722 未加载

评论 #14090721 未加载

评论 #14093247 未加载

koehrabout 8 years ago

derpadeltabout 8 years ago

评论 #14092307 未加载

评论 #14090496 未加载

评论 #14093016 未加载

评论 #14095094 未加载

gnrlistabout 8 years ago

tossandturnabout 8 years ago

Isn't this why FTP had separate Binary and Text transfer modes?

评论 #14090381 未加载

MatthewWilkesabout 8 years ago

chuckdriesabout 8 years ago

So the server was erroneously treating the images as text?

评论 #14090439 未加载

mnarayan01about 8 years ago

yongjikabout 8 years ago

> 0x79 is the z in the ASCII table.The most confusing technically correct statement of the year.Edit: Sorry, scratch "technically correct". Need more coffee.

评论 #14090798 未加载

评论 #14090570 未加载

评论 #14090571 未加载

The Magic 0xC2

10 comments

The Magic 0xC2

10 comments