The bit about <pre> ignoring leading newlines is essentially a <i>completely</i> different thing from the rest of the article. It’s specifically HTML syntax parser behaviour <<a href="https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody" rel="nofollow">https://html.spec.whatwg.org/multipage/parsing.html#parsing-...</a>>:<p>> <i>If the next token is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of pre blocks are ignored as an authoring convenience.)</i><p>This is also applied to <textarea>.<p>Personally, I think it was a mistake, because it complicates things and doesn’t do enough to justify itself. If it also did leading whitespace trimming across all lines, it’d be interesting enough to maybe justify itself as an authoring convenience (… though honestly I suspect that’d end up worse), but as it is it’s just an extra complication. I’ve needed to deal with the nuance of its special behaviour more than once or twice, and I’ve seen others stumble over it too. It’s also part of the fairly small pile of HTML features that make it not-round-trippable: it’s only done in parsing; the serialiser doesn’t insert an extra ␊ if it would emit `<pre>␊`.<p>This is one of the many cases that tempts me in the direction of the XML syntax (which, to many people’s surprise, is absolutely still a thing—save a local file with extension .xhtml, or serve over HTTP with MIME type application/xhtml+xml). The fact that XML <i>doesn’t</i> have a parser that guesses what you meant is generally a nice feature.<p>(XML also has whitespace collapsing, xml:space. Honestly it’s interesting in this context, conveying whitespace-handling intent, but I’ll ignore it. Because it’s never coming to HTML.)<p>But we’re stuck with this behaviour, because it would break compatibility.<p>And that’s where a lot of the <i>rest</i> of the article baffles me, because I get the general sense, from the way he presents information, that this guy doesn’t understand a lot of HTML’s history and philosophy, things I’d expect to be understood by a memory of the Angular team. The suggestions made are generally just <i>obviously</i> not suitable for HTML, not just because of compatibility, but also because of philosophy.<p>You think &#32; should be different from SPACE? Sorry, I think we’re up to about forty years since that ship sailed; entities/character references are strictly shorthand, and numbered entities are strictly code points. And do you know how confusing it would be if it worked differently? It would be a one-off special case.<p>You think you can add a new entity to handle this? In XML I think you might be able to do that (<i>way</i> too long since I’ve written a DTD to remember clearly), but in HTML they’re called character references, because that’s all they can be, and your non-collapsing space would need to be either something entirely new in the document model, or shorthand for something like <span style=white-space:pre-wrap> </span>.<p>> <i>You'd think the CMS should be able to solve this problem, but it really can't.</i><p>Uh, yes it can, and they all do, where they accept plain text, by either chunking the text into HTML paragraphs (e.g. "<p>" + s/\n\n/<\/p><p>/ + "</p>"), or by turning your text line breaks into HTML line breaks (e.g. s/\n/<br>/). CMSes do a <i>lot</i> of dodgy stuff like this. If you want to have nightmares, look at WordPress’s wpautop function, and think through the implications of it all. It’s a radioactive wasteland of bad ideas.<p>It’s also rather important to remember that two line breaks in HTML (e.g. <p>A<br><br>B</p>) is not the same as a paragraph break (e.g. <p>A</p><p>B</p>). Consider margins and text-indent, for a start.<p>> <i>How Could we Fix This?</i><p>The offered solution, “quote your strings”, is what almost all <i>programming</i> languages tend to do. <i>Document</i> languages practically never quote their strings (I can’t immediately think of any even vaguely popular ones that do). Document languages consistently default to text mode, with only markup elements requiring special syntax.<p>As is later noted, there is, of course, absolutely no chance of HTML ever doing anything even vaguely like this. And honestly, if such a breaking change were on the cards, you’d be making <i>far</i> more invasive changes to HTML’s syntax.<p>> <i>3. HTML already breaks the rules of common text formatting.</i><p>> <i>• The idea that you can write HTML today by just typing the text you want is a lie.</i><p>No it isn’t: no one ever suggested that was a feature; there was no dishonesty. HTML is a markup language.<p>—⁂—<p>The remark on template language whitespace control is incorrect:<p>Say hello to
{%- username -%}
and welcome them to the team!<p>You’ll <i>actually</i> get “Say hello toDeveland welcome them to the team!” which is clearly not what’s wanted.<p>—⁂—<p>For my own part, I have at times seriously considered producing HTML with only the whitespace I mean, and applying something along the lines of `:root { white-space: pre-wrap }`.<p>But then I remember that there’s a lot more that’s dodgy around segmentation, both in the directions of extraneous and missing breaks. For example, this URL and its rendering:<p><pre><code> data:text/html,<body style=font-family:monospace;width:5ch>Look at C++!<br>X &lt;/a>
Look
at C+
+!
X </
a>
</code></pre>
Viewing on my phone (which, due to narrower column width, is more likely to demonstrate such problems), I think I’ve come across three articles on HN in the last week or so exhibiting this sort of problem. If I were writing much that referred to C++, I would genuinely make something to change it to <nobr>C++</nobr>, and I <i>do</i> sometimes tweak breaking behaviour inside <code> elements to control where breaks can occur. (I’m also the kind of guy who types actual no-break spaces in Bible references where the book has an ordinal, e.g. “1 John 2:3” will have one NBSP and one SPACE.)<p>And in the end… HTML collapsing whitespace has done a lot to quell the two-spaces-between-sentences convention some hold, so it’s not all bad. ;-)