I did research on parser differentials for my bachelor's thesis. My initial hope was that I would find a few mismatches for formats without a formal specification. I found mismatches for _every single_ pair of parsers I looked at and that included formats with formal specifications. My personal takeaway was "If you use one parser for validation and another parser for evaluation, you're fucked. No exceptions."
As the article mentions, Postel's Law is likely to create vulnerabilities. It makes individual systems more robust, but the whole becomes fragile.
> Well, these browsers "helpfully" fix the URL to change backslashes into regular forward slashes, I suppose because people sometimes type in URLs and get their forward and back slashes confused.<p>More likely because Windows has historically used \ rather than the / that's standard in Unixish systems. Windows people are used to typing \, so it's indeed somewhat helpful for the browser to accept either (e.g., in file:// URLs).
Odd that the article doesn't use the more standard term "parser differential", with "differential fuzzing" as the fuzzing community's method for finding those.
This is a LANGSEC concept. A broader survey can be found at: <a href="https://www.computer.org/csdl/proceedings-article/spw/2023/123600a105/1P5ZnKk745O" rel="nofollow noreferrer">https://www.computer.org/csdl/proceedings-article/spw/2023/1...</a>
I guess if we add all the problems in IT that were caused by bugs and poor designs of parsers/serializations, e.g. SQL injections, XSS, null byte vulns etc., we get billions of human hours in damages.<p>What should be instead is an absolutely clear serialization format into a byte string of ANY data structure that must processed by two different programs.<p>Parsers are programs, they should "parse" bytes, not strings, like we humans do.
If BABLR succeeds in creating a shared instruction set for defining parsers, you'd just have portable parser grammars running on compatible parser VMs
Usually? a result of the parser not having a machine-readable specification.<p>For parsing proper, `bison --xml` is useful if you're allergic to code-generation. I don't have an equivalent for lexing.
Honestly we should have a name for such class of bugs. It's not an "I didn't know" kind of mistake. Every person sufficiently intelligent to program should figure out by themselves that having 2 parser implementations can cause various undesired consequences.
Usually, some not verified and cleaned enough external input text managed to get into some complex and often brain damaged text parser (printf,sql,etc).