TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Magic 0xC2

62 pointsby koehrabout 8 years ago

10 comments

IvanK_netabout 8 years ago
It reminds me a fun fact, which I noticed when writing a TTF font parser <a href="https:&#x2F;&#x2F;github.com&#x2F;photopea&#x2F;Typr.js" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;photopea&#x2F;Typr.js</a><p>TTF files have a 4-byte field, where the font manufacturer can put &quot;information about himself&quot; (like the identification). The Adobe company puts an ASCII string &quot;ADBE&quot; into these four bytes.<p>There is another field for the font manufacturer, which has only two bytes. Guess what Adobe puts into these two bytes? 0xadbe :D
评论 #14096495 未加载
评论 #14091186 未加载
cornstalksabout 8 years ago
Am I the only one struggling to understand the hex dump? The author says &quot;0x79 is the z in the ASCII table.&quot; That&#x27;s wrong. &#x27;z&#x27; is 0x7a.<p>The author also says &quot;In UTF-8 all characters after 0x79 are at least two bytes long.&quot; That&#x27;s also wrong. All characters after 0x7f get encoded as two or more bytes.
评论 #14090722 未加载
评论 #14090721 未加载
评论 #14093247 未加载
koehrabout 8 years ago
So what most probably happened is that FileReader.readAsBinaryString()° defaults to FileReader.readAsText()°° since it&#x27;s deprecated. At least that is what I saw in Chromium. As soon as I used readAsArrayBuffer the problem went away.<p>°) <a href="https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;FileReader&#x2F;readAsBinaryString" rel="nofollow">https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;FileReader&#x2F;...</a><p>°°) <a href="https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;FileReader&#x2F;readAsText" rel="nofollow">https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;FileReader&#x2F;...</a>
derpadeltabout 8 years ago
If you dive head-first into Python&#x27;s string behaviour, you&#x27;ll eventually learn the hard UnicodeDecodeError-way what the difference is between a stream of bytes&#x2F;octets and a text made of unicode code points. Much the same as learning that a timestamp without a timezone is not worth much, a text as a stream of bytes is not much worth without the encoding it is in. PHP also has nice footguns in that area.
评论 #14092307 未加载
评论 #14090496 未加载
评论 #14093016 未加载
评论 #14095094 未加载
gnrlistabout 8 years ago
Sounds like you&#x27;re confusing the integer code point and the integer representation of characters.<p>Many programming languages internally represent chars as UTF-8 or UTF-16, so when using libraries to read bytes into chars everything get&#x27;s mangled.<p>Check out this guide for more in-depth look at the mangling that can happen. <a href="http:&#x2F;&#x2F;cweb.github.io&#x2F;unicode-security-guide&#x2F;background&#x2F;" rel="nofollow">http:&#x2F;&#x2F;cweb.github.io&#x2F;unicode-security-guide&#x2F;background&#x2F;</a>
tossandturnabout 8 years ago
Isn&#x27;t this why FTP had separate Binary and Text transfer modes?
评论 #14090381 未加载
MatthewWilkesabout 8 years ago
I had a similar thing with glitched images once. We had to retrofit a middleware onto a site that would obfuscate email addresses. It used a regex to spot valid emails and replaced them with a hash. It also knew what urls and form parameters expected emails and used a lookup table to translate them back. This was sufficient to anonymise usernames without breaking any functionality on the site.<p>Turns out, we forgot to check content type, and valid emails according to the regex we had used were surprisingly common in binaries.
chuckdriesabout 8 years ago
So the server was erroneously treating the images as text?
评论 #14090439 未加载
mnarayan01about 8 years ago
First guess: The author is running into something related to <a href="https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;XMLHttpRequest&#x2F;Sending_and_Receiving_Binary_Data#Receiving_binary_data_in_older_browsers" rel="nofollow">https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;API&#x2F;XMLHttpRequ...</a>.
yongjikabout 8 years ago
&gt; 0x79 is the z in the ASCII table.<p>The most confusing technically correct statement of the year.<p>Edit: Sorry, scratch &quot;technically correct&quot;. Need more coffee.
评论 #14090798 未加载
评论 #14090570 未加载
评论 #14090571 未加载