TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Unlikely Story of UTF-8: The Text Encoding of the Web

7 pointsby BryanLundukealmost 2 years ago

2 comments

rahimnathwanialmost 2 years ago
When they did this, Unicode already existed, and assigned a code point to each character. There were fewer than 65k code points.<p>Naively, it seems like creating a scheme to pack these code points would be trivial: just represent each character as a series of bytes. But it&#x27;s not so simple! As I understand it:<p>- they wanted backward compatibility with ASCII, which used only a single byte to represent each character<p>- they wanted to use memory efficiently: common characters shouldn&#x27;t use 2 bytes<p>- they wanted to gracefully handle errors: a single corrupted byte shouldn&#x27;t result in the rest of the string being parsed as garbage
ajstarksalmost 2 years ago
See: <a href="https:&#x2F;&#x2F;flickr.com&#x2F;photos&#x2F;ajstarks&#x2F;albums&#x2F;72157631470798870" rel="nofollow noreferrer">https:&#x2F;&#x2F;flickr.com&#x2F;photos&#x2F;ajstarks&#x2F;albums&#x2F;72157631470798870</a>