The irony is that CDATA isn't even very useful; there's no way to escape the ]]> closing tag so you still have to invent some special escaping mechanism to use it.<p>Nobody expects entity definitions in XML either, and yet about once a year some new service or software is found vulnerable to XXE attacks. (Summary: a lot of XML parsers can be made to open arbitrary files or network sockets and sometimes return the content.)<p>XML is a ridiculously complex document format designed for editing text documents. It is not a suitable data interchange format. Fortunately we have JSON now.
I've been following posts about this tool for a few weeks and it is really remarkable how many interesting results are already popping out already. In particular since static analyzers have been around for years and years.<p>I'm assuming afl-fuzz is particularly CPU-bound, and it would be interesting to see some numbers about how many CPU years are being dedicated to it at the moment - and if we would see even more interesting stuff if a larger compute cluster was made available.<p>It's also super scary how "effortlessly" these bugs appear to be uncovered, even in "well-aged" software like "strings".
Heads-up to the "comment without reading the article" crowd: the title is <i>not</i> bemoaning a lack of handling for CDATA in existing parsers. It's discussing an interesting behavior of the AFL fuzzer when used with formats that require fixed strings in particular places...<p>Related: NOBODY EXPECTS THE SPANISH INQUISITION, either. :)
This thread reminded me of a draft post I've been sitting on for a while, related to ENTITY tags in XML and XXE exploits.<p>Basically, it's really easy to leave default XML parsing settings (for things like consuming RSS feeds) and accidentally open yourself up to reading files off the filesystem.<p>I did a full write-up and POC here: <a href="http://mikeknoop.com/lxml-xxe-exploit" rel="nofollow">http://mikeknoop.com/lxml-xxe-exploit</a>
I'm actually not so surprised, given what the fuzzer does - mutating input to make forward progress in the code. Incremental string comparisons definitely fall under this category since they have a very straightforward definition of "forward progress"; either the byte is correct and we can enter a previously unvisited state, or it's incorrect and execution flows down the unsuccessful path. It's somewhat like the infinite monkey theorem, except the random stream is being filtered such that only a correct subsequence is needed to advance.<p>On the other hand, I'd be astonished if it managed to fuzz its way through a hash-based comparison (especially one involving crypto like SHA1 or MD5.)
But of course no one uses either when there's Atom/GitHub's favorite: CSON. <a href="https://github.com/bevry/cson" rel="nofollow">https://github.com/bevry/cson</a>
Maybe C based XML parsers don't, but JVM and .NET based XML parsers don't have any issues with CDATA sections.<p>Time to upgrade to more modern tools?