IMO one of the pedagogical issues is that people who start with ASCII often assume that the byte-representation (e.g. 0x48) is numerically the same as the code-point (48 in hex and/or 73 in decimal) and vice versa.<p>This leads to a mental model of:<p><pre><code> (bytes which are numbers) -> pictures
</code></pre>
That breaks down when you get into UTF-8 which forces people to recognize more steps:<p><pre><code> bytes -> numbers -> pictures
</code></pre>
And then when it comes to things like code-points that might have no visual representation themselves, but modify others, like accents.<p><pre><code> bytes -> numbers -> groups of numbers modifying each other -> pictures</code></pre>