For each input source file, cl.exe creates at least 7 temporary files (with suffixes "gl", "sy", "ex", "in", "db", "md", "lk"). The churn of creating and deleting those, coupled with the slowness of performing checkpointing on a huge empty drive, seem to be the root cause here.<p>This appears somewhat related to this bug report: <a href="https://developercommunity.visualstudio.com/content/problem/310131/clexe-creates-so-many-temp-files-it-freezes-the-sy.html" rel="nofollow">https://developercommunity.visualstudio.com/content/problem/...</a><p>Marking the temporary files as FILE_ATTRIBUTE_TEMPORARY could improve things, without having to go into significant Windows kernel changes.
Technically this wasn't caused by those instructions but by the spinlocks waiting for the lock to be released. Also "blocked by seven instructions" sounds a bit click-baity.. you can lock the CPU or power off the computer with less than that amount of instructions :-)
So, one busy process performs a file operation that triggers a system restore checkpoint, and the OS locks the entire drive during this file operation? Sounds strange to me.<p>Is the problem that the checkpointing critical section has the same duration as the triggering file operation?<p>I get that there must be some sort of critical section for setting a checkpoint, but I don't understand why it takes so long, and why it would be affected by how busy the userspace process that triggered it is.<p>I would expect it to have a short barrier-style critical section; drain all outstanding writes, record some checksum or counter from a kernel data structure, and then release all writers again.<p>In my mind this should be kernel code only, entirely unaffected by userspace, and if designed nicely, quite fast.<p>So I guess I don't get what is going on here.
It looks like this is a case where a process is holding lock A while waiting on lock B; and every other process is waiting on lock A. That's normal enough, though it seems like there are two mistakes:<p>First: Never spin waiting on a lock for 3 seconds. If you expect a lock to be released very quickly, you spin K times and then, if you still don't have the lock, try something heavier that can deschedule your process. K should be small enough that your time slice is unlikely to expire while spinning, otherwise, it just causes confusion and wasted work because it looks like your process is doing work when it's not.<p>Second: It seems dubious that using a feature like system restore causes all Write calls to wait for a lock held by a process in the middle of I/O. I'm sure there are some cases where that must happen (like if out of buffer space to hold the writes), but I would think it would be harder to hit.<p>EDIT: Rephrased my comment in terms of two problems rather than just the first one.
"loop running in the system process while holding a vital NTFS lock"<p>It's not about the seven instructions. It's the lock that's been held while doing a busy loop.
Excerpt:
"...I mean, how often do you have one thread spinning for several seconds in a seven-instruction loop while holding a lock that stops sixty-three other processors from running. That’s just awesome, in a horrible sort of way."<p>I respectfully disagree.<p>That's because everything in the universe that is percieved as negative -- turns out to have a positive use-case somewhere, sometime, in some context...<p>In this case, I think the ability for one core to stop 63 other processor cores is purely awesome, because think of the possible use-cases! Debugger comes to mind immediately, but how about another if let's say there are 63 nasty self-resurrecting virus threads running on my PC? What about if you were doing some kind of esoteric OS testing where you needed to return to something like Unix's runlevel 1 (single user), but you'd rather freeze most of the machine (rather than destroying the context of everything else that was previously running?).<p>Oh, here's the best one I can think of -- don't just do a postmortem, everything's dead core dump when something fails -- do a full (frozen!) "live" dump of a system that can be replayed infinitely, from that state!<p>Now, because I take a contradictory position, doesn't mean we're not friends, or that I don't acknowledge your technical brilliance! Your article was absolutely great, and you are absolutely correct that for your use-case, "That’s just awesome, in a horrible sort of way.".<p>But for my use-cases, it's absolutely awsome, in the most awesome sort of way! <g>
These posts by Dawson are always interesting. Now, if only he would investigate and remediate the performance deficiencies of other complex systems, such as ... Chrome?
So when did you first realize he was discussing Windows, reading this?<p>The "of course everyone is a straight white male" attitude that the OS need not be stated, so often seen in Windows posts, gave it away for me. However, my biases threw me for way too long: the level of sophistication meant this must be Linux, right? I should have recognized the graphics style in the screen grabs. Certainly not MacOS, but Linux can be all over the map stylistically. Does Windows really still look like that? Wow.
The cause is obvious: they were building on Microsoft Windows, using the NTFS filesystem. Even Microsoft doesn't try to build on NTFS.<p>Changing any single detail gives better results. Use a Samba share from a Linux filesystem. Run Mingw on a Linux system. Run MSVS in Wine on a Linux system.<p>Windows is an execution environment for applications. There is no need for, and no value in, actually performing builds in your target execution environment. Use a system designed from the ground up for builds.