I’m surprised to see the highlights don’t include another common detail of the parsing algorithm that often trips people up: table rows and cells (tr/th/td) must be in one of thead/tbody/tfoot. If they’re not, they’re implicitly nested into a tbody. As in:<p><pre><code> <table>
<!-- <tbody> -->
<tr>
<th>Column one</th>
<th>Column two</th>
</th>
<tr>
<td>Row one col one</td>
<td>Row one col two</td>
</th>
<!-- </tbody> -->
</table>
</code></pre>
I’ve frequently seen it cause a variety of issues with VDOM libraries, and even plain DOM libraries with a notion of declarative templates, ranging from hydration mismatch logs (meh) to actual logic errors (corruption of the real DOM when nodes aren’t where they’re expected to be).<p>Other implied/omitted tags like body can cause similar issues too, but I think that’s become a far less common “mistake” (all of these are totally <i>valid</i> since at least HTML5) in recent years.
Perhaps a more intuitive name would be "round-trip serialization HTML". That is, if you use the browser to parse and print some HTML, it matches the source code.<p>Or in other words, it's formatted the same way that the browser would do it. So, you use the browser to pretty-print the HTML page, and save the code as the source. It's not hard at all and could be done automatically.<p>Round-trip tests are often used to check that a deserialization routine outputs data that can be serialized again and no data is lost. It even lets you change the serialization format, provided that you change the parser and printer to match.<p>I expect that these sort of tests are a lot more useful with fuzzing, though. Finding one example that works mostly just tells you that the browser's HTML printing code isn't completely broken. A single test of that sort is only useful for catching stupid bugs quickly.
This is called print-read consistency in the Lisp world: an object is printed in such a way that the syntax can be read to produce a similar object, or else is given a deliberately unreadable notation like #<...>, where the #< combination is required to produce a read error.<p><a href="https://stackoverflow.com/questions/70797208/what-is-print-read-consistency" rel="nofollow">https://stackoverflow.com/questions/70797208/what-is-print-r...</a>
> Why write Fixed-Point HTML?<p>> simply the satisfaction of knowing that you and the browser are in total agreement<p>So, just to clarify: there's no technical benefit, correct?
> the real reason to code in Fixed-Point HTML is simply the satisfaction of knowing that you and the browser are in total agreement about the HTML.<p>Interesting idea, I've been trying to achieve something similar but in reverse... rather than make my source match the browser, make the browser match my source by making it <i>not</i> ignore spacing.<p>i.e The basics being `white-space: pre;` on the body element, and fixed width and sized fonts. But I still want a HTML document so i can opt in to html where it matters. My reasons are to A) avoid a pre-processor and build toolchain complexity, stick to nice simple static files, and B) I get something similar to WYSIWYG but as source code. C) I like fixed width fonts and to plain text formatting (reducing decisions is helpful for focus).
Before now I've explicitly reduced the size of my HTML docs (nothing critical/production facing, all passion projects) by removing certain HTML tags (e.g DOCTYPE, closing tags, etc) because I know modern browsers will still render them correctly.<p>This means there are miniscule savings from a bandwidth serving perspective. I wonder what the trade off is between the HTTP call and document parse/paint.<p>E.g is it correct to assume the browser will parse/paint the HTML content - fixing incorrectly closed tags on the fly faster than the few milliseconds more it would take to serve fixed-point HTML from the server?
XML-flavored self-closing elements are banished (use <br> instead of <br />)<p>God I hate that. It just doesn’t make sense. Where is the <br> closed?