> There is a function, ossl_punycode_decode that decodes Punycode. Punycode is a way to encode Unicode as ASCII, used to represent Unicode strings in the ASCII-only world of DNS. ossl_punycode_decode takes an output buffer, and if the buffer runs out it keeps parsing and verifying the Punycode but discards the rest of the output.<p>> I did not have the heart to figure out why it works like this. Maybe there's a good reason to do progressive parsing. Maybe it's an artifact of how C makes you do memory management or of OpenSSL's C style. Anyway.<p>Not involved in OpenSSL, but this is a fairly common pattern in a lot of C APIs. You want to decode some data, but you're not sure how big the output is going to be ahead of time. You could write a separate function to calculate the length, but that function has to do most of the work of actual decoding to figure out that length. A lot of times the output is small enough it can fit in some conservatively sized buffer so you can save a fair bit of work by having a (potentially stack-allocated) buffer of some fixed size and then allocating a precisely sized buffer on the heap if it turns out to not be big enough. Further, having a separate length function means you typically end up with two similar but separate decode implementations which has its own problems.<p>In most other languages, you have some sort of growable container in the standard library so you just avoid the problem entirely at the expense of having less control over memory allocations.
So the blog mentions that in certain cases you have to decode the punycode from one field to compare it to the value in another field.<p>Would it have been safer to <i>encode</i> the data in the other field to punycode and compare the encoded values?<p>That way a hacker can’t mess with your decoder (where bugs like to lie). But at the other end you risk that your <i>encoder</i> has an issue. I do t know how to judge if those are equal risks or not.<p>Thoughts?
It feels like issues like those are more common in parsers, this specific kind of software.<p>But why?<p>Why is parsing so hard? or is it just in low lvl languages? or maybe languages with poor string primitives?<p>I've written parsers in high level languages and it didnt felt dangerous or insanely hard
> <i>As curl author Daniel Stenberg said, "I've never even considered to decode punycode. Why does something like OpenSSL need to decode this?"</i><p>> <i>[...]</i><p>> <i>Internationalization is not the issue, internationalization is the job.</i><p>I want to cheer at this.<p>Internationalization is <i>hard</i>, really hard, especially in a computing world so defined by its english-language dominance. The little amount of effort demonstrated in defining and testing this feature demonstrates that.<p>For all I admire Stenberg's work and focus on quality, that was a rather poor take from him.
I decided to review SBOMs from about 3,800 popular mobile apps to see if any included vulnerable versions of OpenSSL v3.0.x. No mobile apps did (not surprised) but what did surprise me was 98% of the OpenSSL versions included in these apps were vulnerable to older CVEs. About 16% of the apps included OpenSSL, mostly as a transitive dependency.<p>I posted additional details in this blog+video: <a href="https://www.andrewhoog.com/post/how-to-detect-openssl-v3-and-heartbleed-in-mobile-apps/" rel="nofollow">https://www.andrewhoog.com/post/how-to-detect-openssl-v3-and...</a>
Why would a TLS library even parse punycode? The sole reason we still use this hack is so that core infrastructure does not have to change when we use internationalized domains.<p>If TLS libraries, DNS servers, HTTP servers, ... needed to be patched anyways for the use of IDNs, then why did we not do it properly and just use UTF-8?<p>No matter how many vulnerabilities we've introduced into software, "Š" in my name still cannot be represented in domain names and is instead written with american letters xn--pga.<p>I'm sorry for poorly articulating my thoughts here.
About the classification: I would add that the people choosing the severity are a bit alone because of secrecy, so they also can’t really ask too much for advice.<p>That’s probably where the NSA could be useful because of their big number of competent and sworn to secrecy employees, but nobody can trust that the zero day sent for an opinion will not be used to fuck with a foreign country on day one.
So, punycode I do think was silly. We should just have used UTF-8 in DNS and have left it at that.<p>Using UTF-8 would not have required a flag day. It would have required upgrading some servers in order to be able to have non-ASCII domainnames, but it wouldn't have broken anyone not using non-ASCII domainnames.