WorstFit: Unveiling Hidden Transformers in Windows ANSI

373 点作者 notmine13374 个月前

29 条评论

This is a tough one. It’s systemic —- MS provides a “best fit” code mapping from wide Unicode to ASCII, which is a known, published, “vibes-based” mapper. This best fit parser is used a lottt of places, and I’m sure that it’s required for ongoing inclusion based on how MS views backward compatibility. It’s linked in by default everywhere, whether or not you know you included it.The exploits largely revolved around either speccing an unusual code point that “vibes” into say a slash or a hyphen or quotes. These code points are typically evaluated one way (correct full Unicode evaluation) inside a modern programming language, but when passed to shell commands or other Win32 API things are vibes-downed. Crucially this happens after you check them, since it’s when you’ve passed control.To quote the curl maintainer “curl is a victim” here — but who is the culprit? It seems certain that curl will be used to retrieve user supplied data automatically by a server in the future. When that server mangles user input in one way for validation and another when applied to system libraries, you’re going to have a problem.It seems to me like maybe the solution is to provide an opt-out of “best fit” munging in the Win32 space, but I’m not a Windows guy, so I speculate. At least then open source providers could just add the opt out to best practices, and deal with the many terrible problems that things like a Unicode wide variant of “ or \ delivers to them.And of course even if you do that, you’ll interact with officially shipped APIs and software that has not opted out.

评论 #42647998 未加载

评论 #42653554 未加载

评论 #42710007 未加载

评论 #42651922 未加载

mmastrac4 个月前

This is kind of unsurprising, but still new to me even as someone who did Windows development (and some Wine API hacking) for a decade around when this W/A mess came about.Windows is like the card game Munchkin, where a whole bunch of features can add up to a completely, unbelievably random over-powered exploit because of unintentional synergy between random bits.I'm happy to see that they are converting the ANSI subsystem to UTF-8, which should, in theory, mitigate a lot of these problems.I wonder if the Rust team is going to need YetAnotherFix to the process spawning API to fix this...

评论 #42660964 未加载

Joker_vD4 个月前

> the only thing we can do is to encourage everyone, the users, organizations, and developers, to gradually phase out ANSI and promote the use of the Wide Character API,This has been Microsoft's official position since NT 3.5, if I remember correctly.Sadly, one of the main hurdles is the way Microsoft's own C/C++ runtime library (msvcrt.dll) is implemented. Its non-standard "wide" functions like _wfopen(), _wgetenv(), etc. internally use W-functions from Win API. But the standard, "narrow" functions like fopen(), getenv(), etc., instead of using the "wide" versions and converting to-from Unicode themselves (and reporting conversion failures), simply use A-functions. Which, as you see, generally don't report any Unicode conversion failures but instead try to gloss over them using best-fit approach.And of course, nobody who ports software, written in C, to Windows wants to rewrite all of the uses of standard functions to use Microsoft's non-portable functions because at this point, it becomes a full-blown rewrite.

评论 #42647989 未加载

评论 #42648790 未加载

评论 #42648957 未加载

评论 #42647980 未加载

评论 #42649272 未加载

评论 #42651051 未加载

Dwedit4 个月前

There are two ways to force the "Ansi" codepage to actually be UTF-8 for an application that you write (or an EXE that you patch).One way is with a Manifest file, and works as of a particular build of Windows 10. This can also be applied to any EXE after building it. So if you want a program to gain UTF-8 support, you can hack it in. Most useful for console-mode programs.The other way is to use the hacks that "App Locale" type tools use. One way involves undocumented function calls from NTDLL. I'm not sure exactly which functions you need to call, but I think it might involve "RtlInitNlsTables" and "RtlResetRtlTranslations" (not actually sure).

layer84 个月前

> until Microsoft chooses to enable UTF-8 by default in all of their Windows editions.I don’t know how likely this is. There are a lot of old applications that assume a particular code page, or assume 1 byte per character, that this would break. There are also more subtle variations of this, like applications assuming that converting from wide characters to ANSI can’t increase the number of bytes (and hence an existing buffer can be safely reused), which isn’t the case for UTF-8 (but for all, or almost all, existing code pages). It can open up new vulnerabilities.It would probably cause much less breakage to remove the Best-Fit logic from the win32 xxxA APIs, and instead have all unmappable characters be replaced by a character without any common meta semantics, like “x”.

评论 #42650777 未加载

评论 #42649502 未加载

评论 #42657433 未加载

garganzol4 个月前

Microsoft was aware of this issue at least 1 year ago. I know this because they released a special code analysis rule CA2101 [1] that explicitly discouraged the use of the best-fit mapping. They mentioned security vulnerabilities in the rule’s description, but they were purposefully vague in details though.[1] <a href="https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca2101" rel="nofollow">https://learn.microsoft.com/en-us/dotnet/fundamentals/code-a...</a>

cesarb4 个月前

> However, resolving this problem isn’t that as simple as just replacing the main() with its wide-character counterpart. Since the function signature has been changed, maintainers would need to rewrite all variable definitions and argument parsing logics, converting everything from simple char * to wchar_t *. This process can be painful and error-prone.You don't need to convert everything from char * to wchar *. You can instead convert the wide characters you received to UTF-8 (or to something like Rust's WTF-8, if you want to also allow invalid sequences like unpaired surrogates), and keep using "char" everywhere; of course, you have to take care to not mix ANSI or OEMCP strings with UTF-8 strings, which is easy if you simply use UTF-8 everywhere. This is the approach advocated by the classic <a href="https://utf8everywhere.org/" rel="nofollow">https://utf8everywhere.org/</a> site.

segasaturn4 个月前

I've been inadvertantly safe from this bug on my personal Windows computer for years thanks to having the UTF-8 mode set, as shown at the bottom of the article. I had it set due to some old, foreign games showing garbled nonsense text on my computer. Have not noticed any bugs or side effects despite it being labelled as "Beta".

评论 #42649606 未加载

评论 #42651907 未加载

scoopr4 个月前

I was wondering if the beta checkbox the same thing as setting the ActiveCodePage to UTF-8 in the manifest, but the docs[0] clarify that GDI doesn't adhere to per-process codepage, but only a single global one, which is what the checkbox sets.Bit of a shame that you can't fully opt-in to be UTF-8 with the *A API, for your own apps. But I think for the issues highlighted in the post, I think it would still be a valid workaround/defence-in-depth thing.[0] <a href="https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page" rel="nofollow">https://learn.microsoft.com/en-us/windows/apps/design/global...</a>

lifthrasiir4 个月前

Oh, my, freaking, god. I knew Windows API provides that sort of best-fit conversions, but didn't realize that it was a default behavior for several ANSI functions in my native code page (949 [1])! At this point they should be just banned like gets.[1] Yes, I know there is a UTF-8 code page (65001). That was really unusable for a long time and still is suffering compatibility issues to this day.

mouse_4 个月前

Unicode on modern systems is absolutely terrifying. Anyone remember the black dot of death? <a href="https://mashable.com/article/black-dot-of-death-unicode-imessage-iphone-crash" rel="nofollow">https://mashable.com/article/black-dot-of-death-unicode-imes...</a>

kazinator4 个月前

HN, Help! Before I dive into this, does anyone know whether this affects the argument parsing in Cygwin, that prepares the arguments for a regular int main(int argc, char *argv)?TXR Lisp uses wchar_t strings, and the "W" functions on Windows. So that's well and good. But it does start with a regular C main, relying on the Cygwin run-time for that.If that's vulnerable, I will hack it to have its own argument parsing, using the wide char command line.Maybe I should ask this on the Cygwin mailing list.

评论 #42655676 未加载

评论 #42660625 未加载

评论 #42658570 未加载

bangaladore4 个月前

I tend to agree that this is not an issue with many of the applications that are mentioned in the post.Fundamentally this boils down to essentially bugs in functions that are supposed to transform untrusted into trusted input like the example they gave:`system("wget.exe -q " . escapeshellarg($url));``escapeshellarg` is not producing a trusted output with some certain inputs.

评论 #42648430 未加载

sharpshadow4 个月前

That’s amazing great read. According to for example this[0] post it’s possible to change code pages in windows in various ways and would allow the use of multiple BestFit scenarios on the same OS without reboot. Even combining them should be possible.

评论 #42662833 未加载

pornel4 个月前

It would be easily fixable if CommandlineToArgvA was obtaining the command line itself. Then instead of converting to ANSI and then parsing that, it could parse args in Unicode, and then convert argument by argument to ANSI. The output would be ANSI compatible, but split and unescaped in the true form.Unfortunately, the parsing is a two-step operation, with the application calling GetCommandLineA itself first and passing that to the parser, so a fix would need a hack to correlate the versions of the command line input without breaking when it's given a different string.

nitwit0054 个月前

There are presumably some similar .Net COM issues when communicating with unmanaged code, as there is an attribute for controlling this conversion: <a href="https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.bestfitmappingattribute" rel="nofollow">https://learn.microsoft.com/en-us/dotnet/api/system.runtime....</a>It directly mentions: "Setting BestFitMappingAttribute parameters in this manner provides an added measure of security."

LudwigNagasena4 个月前

What would even be the proper way to do `system("wget.exe -q " . escapeshellarg($url))`? It’s ridiculous that plaintext IPC is still the primary interface for many tools.

评论 #42658603 未加载

评论 #42656726 未加载

layer84 个月前

> And yes, Python’s subprocess module can’t prevent this.A reasonably sane solution would be for it to reject command line arguments on Windows that contain non-ASCII characters or ASCII characters that aren’t portable across code pages (not all code pages are a superset of US-ASCII), by default, and to support an optional parameter to allow the full range, documenting the risk.

评论 #42653332 未加载

radarsat14 个月前

Seems like a another possible fix would be to change the best fit mapping table to never generate control characters, but only alphanumerics. So map quote-like characters to 'q' and so on.This might be uglier and slightly change behaviour, but only for vulnerable applications.

ok1234564 个月前

Bush hid the facts

评论 #42649123 未加载

评论 #42648681 未加载

lilyball4 个月前

> Worse still, as the attack exploits behavior at the system level during the conversion process, no standard library in any programming language can fully stop our attack!What happens if the standard library updates its shell escaping to also escape things like the Yen character and any other character that has a Best-Fit translation into a quote or backslash? Which is to say, what does Windows do for command-line splitting if it encounters a backslash-escaped nonspecial character in a quoted string? If it behaves like sh and the backslash simply disables special handling of the next character, then backslash-escaping any threat characters should work.

评论 #42653103 未加载

评论 #42658634 未加载

account424 个月前

Window's A APIs and conversion functions are best ingored entirely.Always use W functions and use your own converions (that can round-trip invalid UTF-16 like WTF-8) if you want to use an 8-bit encoding internally.Most (all?) of the exploits here are already bugs because the applications don't properly handle unicode data.

rubatuga4 个月前

From what I can tell the largest vulnerability is argument passing to executables in Windows. Essentially it is very difficult to safeguard it. I've seen some CLI programs use the '--' to signify user input at the end, maybe this would solve this for a single argument scenario. Overall, this is an excellent article and vulnerability discovery.

ppp9994 个月前

Character encoding has been such a mess for so long it's crazy.

评论 #42662888 未加载

est4 个月前

I remember typing some prefix character in notepad.exe then your hole txt became messed up. Funny unicode times.

评论 #42654566 未加载

UltraSane4 个月前

The loosey-goosey mapping of code points to characters has always bothered me about Unicode.To guard against this nasty issue that is going to take years to fix you can enable global UTF-8 support by doingSettings > Time & language > Language & region > Administrative language settings > Change system locale, and check Beta: Use Unicode UTF-8 for worldwide language support. Then reboot the PC for the change to take effect.

评论 #42658640 未加载

Randor4 个月前

That was a long read. Just be happy that you never had to deal with Trigraphs. <a href="https://learn.microsoft.com/en-us/cpp/c-language/trigraphs?view=msvc-170" rel="nofollow">https://learn.microsoft.com/en-us/cpp/c-language/trigraphs?v...</a>

EdSharkey4 个月前

Distributing native binaries is so dangerous!

评论 #42655024 未加载

tiahura4 个月前

Imagine no Unicode, It’s easy if you try, No bytes that bloat our systems, No errors make us cry. Imagine all the coders, Living life in ASCII…Imagine no emojis, Just letters, plain and true, No accents to confuse us, No glyphs in Sanskrit too. Imagine all the programs, Running clean and fast…You may say I’m a dreamer, But I’m not the only one. I hope someday you’ll join us, And encoding wars will be done.

评论 #42653173 未加载

评论 #42654481 未加载

29 条评论

vessenes4 个月前

评论 #42647998 未加载

评论 #42653554 未加载

评论 #42710007 未加载

评论 #42651922 未加载

mmastrac4 个月前

评论 #42660964 未加载

Joker_vD4 个月前

评论 #42647989 未加载

评论 #42648790 未加载

评论 #42648957 未加载

评论 #42647980 未加载

评论 #42649272 未加载

评论 #42651051 未加载

Dwedit4 个月前

layer84 个月前

评论 #42650777 未加载

评论 #42649502 未加载

评论 #42657433 未加载

garganzol4 个月前

cesarb4 个月前

segasaturn4 个月前

评论 #42649606 未加载

评论 #42651907 未加载

scoopr4 个月前

lifthrasiir4 个月前

mouse_4 个月前

kazinator4 个月前

评论 #42655676 未加载

评论 #42660625 未加载

评论 #42658570 未加载

bangaladore4 个月前

评论 #42648430 未加载

sharpshadow4 个月前

评论 #42662833 未加载

pornel4 个月前

nitwit0054 个月前

LudwigNagasena4 个月前

What would even be the proper way to do `system("wget.exe -q " . escapeshellarg($url))`? It’s ridiculous that plaintext IPC is still the primary interface for many tools.

评论 #42658603 未加载

评论 #42656726 未加载

layer84 个月前

评论 #42653332 未加载

radarsat14 个月前

ok1234564 个月前

Bush hid the facts

评论 #42649123 未加载

评论 #42648681 未加载

lilyball4 个月前

评论 #42653103 未加载

评论 #42658634 未加载

account424 个月前

rubatuga4 个月前

ppp9994 个月前

Character encoding has been such a mess for so long it's crazy.

评论 #42662888 未加载

est4 个月前

I remember typing some prefix character in notepad.exe then your hole txt became messed up. Funny unicode times.