Anyone have examples of XML that can be mutated? My guess is that it wouldn't take much.<p>I expect that a similar problem will be found in many other libraries, if the XML was publicized. XML namespaces made a critical... "mistake" is probably too strong, but "design choice that deviated too far from people's mental model" is about right... that has prevented them from being anywhere near as useful or safe as they could be. In an XML document using XML namespaces, "ns1:tagname" may not equal "ns1:tagname", and "ns1:tagname" can be equal to "ns2:tagname". This breaks people's mental models of how XML works, and correspondingly, breaks people's code that manipulates XML.<p>(I actually used the Go XML library as an SVG validator in the ~1.8 timeframe and had to fork it to fix namespaces well enough to serve in that role. I didn't know about how to exploit it in a specific XML protocol but I've know about the issues for a while. "Why didn't you upstream it then?" Well, as this security bulletin implies, the data structures in encoding/xml are fundamentally wrong for namespaced XML to be round-tripped and there is no backwards-compatible solution to the problem, so it was obvious to me without even trying that it would be rejected. This has also been discussed on a number of tickets subsequently over the years, so that XML namespace handling is weak in the standard library is not news to the Go developers. Note also that it's "round-tripping" that is the problem; if you parse & consume you can write correct code, it's the sending it back out that can be problematic.)<p>Namespaces fundamentally rewrite the nature of XML tag and attribute names. No longer are they just strings; now they are tuples of the form (namespace URL, tag name)... and namespace URL is <i>NOT</i> the prefix that shows up before the colon! The prefix is an abbreviation of an earlier tag declaration. So in the XML<p><pre><code> <tag xmlns="https://sample.com/1" xmlns:example1="https://blah.org/1">
<example1:tag xmlns:example2="https://blah.org/2">
<example2:tag xmlns:example1="https://anewsite.com/xmlns">
<example1:tag />
</example2:tag>
</example1:tag>
</tag>
</code></pre>
not a SINGLE ONE of those "tag"s is the same! They are, respectively, actually (<a href="https://sample.com/1" rel="nofollow">https://sample.com/1</a>, tag), (<a href="https://blah.org/1" rel="nofollow">https://blah.org/1</a>, tag), (<a href="https://blah.org/2" rel="nofollow">https://blah.org/2</a>, tag), and (<a href="https://anewsite.com/xmlns" rel="nofollow">https://anewsite.com/xmlns</a>, tag). There's a ton of code, and indeed, even quite a few standards, that will get that wrong. (Note the redefinition of 'example1' in there; that is perfectly legal.) Even more excitingly,<p><pre><code> <tag xmlns="https://sample.com/1" xmlns:example1="https://sample.com/1">
<example1:tag/>
<example2:tag xmlns:example2="https://sample.com/1" />
</tag>
</code></pre>
<i>ARE</i> all the exact tag and should be treated as such, despite the different "tag names" appearing.<p>Reserializing these can be exciting, because A: Your XML library, in principle, ought to be presenting you the (XMLNS, tagname) tuple with the abbreviation stripped away, to discourage you from paying too much attention to the abbreviation but B: humans in general and a lot of code expect the namespace abbreviations to stay the same in a round trip, and may even standardize on what the abbreviations should be. There's a LOT of code out there in the world looking for "'p' or 'xhtml:p'" as the tag name and not ("<a href="http://www.w3.org/1999/xhtml" rel="nofollow">http://www.w3.org/1999/xhtml</a>", "p").<p>In general, to maintain roundtrip equality, you have to either A: maintain a table of the abbreviations you see, when they were introduced, and also which was used or B: just use the (XMLNS, tagname) and ensure that while outputing that the relevant namespaces have always been declared. Generally for me I go for option B as it's generally easier to get correct and I pair it with a table of the most common namespaces for what I'm working in, so that, for example, XHTML gets a hard-coded "xhtml:" prefix. It is very easy if you try to implement A to screw it up in a way that can corrupt the namespaces on some input.<p>(Option B has its own pathologies. Consider:<p><pre><code> <tag xmlns:sample="https://example.com/1">
<sample:tag1 />
<sample:tag2 />
</tag>
</code></pre>
It's really easy to write code that will drop the xmlns specification on all of the children of "tag", since it didn't use it there, and if your code throws away where the XMLNS was declared and just looks to whether the NS is currently declared, it'll see a new declaration of the "sample" namespace on every usage. Technically correct if the downstream code handles namespaces correctly (big if!), but visually unappealing.)<p>Not defending Go here, except inasmuch as it's such a common error to make that I have a hard time naming libraries and standards that get namespaces <i>completely</i> correct, for as simple as they are in principle. (I think SVG and XHTML have it right. XMPP is very, very close, but still has a few places where the "stream" tag is placed in different namespaces and you're just supposed to know to handle it the same in all the namespaces it appears it... which most people do only because it doesn't occur to them that technically these are separate tags, so it all kinda works out in the end.... libxml2 is correct but I've seen a lot of things that build on top of it and they almost all screw up namespaces.)