TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The absolute minimum you must know about Unicode and encodings

15 pointsby halb10 months ago

2 comments

gnabgib10 months ago
(2003) Big in:<p>2012 (214 points, 75 comments) <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=3448507">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=3448507</a><p>2014 (96 points, 37 comments) <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6996500">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6996500</a><p>2010 (61 points, 21 comments) <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=1219065">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=1219065</a><p>2017 (57 points, 11 comments) <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=13908703">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=13908703</a>
Terr_10 months ago
IMO one of the pedagogical issues is that people who start with ASCII often assume that the byte-representation (e.g. 0x48) is numerically the same as the code-point (48 in hex and&#x2F;or 73 in decimal) and vice versa.<p>This leads to a mental model of:<p><pre><code> (bytes which are numbers) -&gt; pictures </code></pre> That breaks down when you get into UTF-8 which forces people to recognize more steps:<p><pre><code> bytes -&gt; numbers -&gt; pictures </code></pre> And then when it comes to things like code-points that might have no visual representation themselves, but modify others, like accents.<p><pre><code> bytes -&gt; numbers -&gt; groups of numbers modifying each other -&gt; pictures</code></pre>