63 Cores Blocked by Seven Instructions

480 pointsby nikbackmover 5 years ago

14 comments

pbsdover 5 years ago

For each input source file, cl.exe creates at least 7 temporary files (with suffixes "gl", "sy", "ex", "in", "db", "md", "lk"). The churn of creating and deleting those, coupled with the slowness of performing checkpointing on a huge empty drive, seem to be the root cause here.This appears somewhat related to this bug report: <a href="https://developercommunity.visualstudio.com/content/problem/310131/clexe-creates-so-many-temp-files-it-freezes-the-sy.html" rel="nofollow">https://developercommunity.visualstudio.com/content/problem/...</a>Marking the temporary files as FILE_ATTRIBUTE_TEMPORARY could improve things, without having to go into significant Windows kernel changes.

评论 #21313542 未加载

评论 #21312928 未加载

markdog12over 5 years ago

Bruce Dawson does some of Microsoft's most valuable work for Windows. Doesn't even work for them.

评论 #21315943 未加载

评论 #21311859 未加载

KenanSulaymanover 5 years ago

Technically this wasn't caused by those instructions but by the spinlocks waiting for the lock to be released. Also "blocked by seven instructions" sounds a bit click-baity.. you can lock the CPU or power off the computer with less than that amount of instructions :-)

评论 #21310432 未加载

评论 #21320174 未加载

评论 #21318363 未加载

评论 #21312418 未加载

vkakuover 5 years ago

It's good to see how features turned on by default (System Restore) can have such a bad impact on performance. Thank you for doing the profiling!

评论 #21310862 未加载

评论 #21311116 未加载

strictfpover 5 years ago

So, one busy process performs a file operation that triggers a system restore checkpoint, and the OS locks the entire drive during this file operation? Sounds strange to me.Is the problem that the checkpointing critical section has the same duration as the triggering file operation?I get that there must be some sort of critical section for setting a checkpoint, but I don't understand why it takes so long, and why it would be affected by how busy the userspace process that triggered it is.I would expect it to have a short barrier-style critical section; drain all outstanding writes, record some checksum or counter from a kernel data structure, and then release all writers again.In my mind this should be kernel code only, entirely unaffected by userspace, and if designed nicely, quite fast.So I guess I don't get what is going on here.

评论 #21320185 未加载

评论 #21318744 未加载

jeffdavisover 5 years ago

It looks like this is a case where a process is holding lock A while waiting on lock B; and every other process is waiting on lock A. That's normal enough, though it seems like there are two mistakes:First: Never spin waiting on a lock for 3 seconds. If you expect a lock to be released very quickly, you spin K times and then, if you still don't have the lock, try something heavier that can deschedule your process. K should be small enough that your time slice is unlikely to expire while spinning, otherwise, it just causes confusion and wasted work because it looks like your process is doing work when it's not.Second: It seems dubious that using a feature like system restore causes all Write calls to wait for a lock held by a process in the middle of I/O. I'm sure there are some cases where that must happen (like if out of buffer space to hold the writes), but I would think it would be harder to hit.EDIT: Rephrased my comment in terms of two problems rather than just the first one.

评论 #21318794 未加载

评论 #21320167 未加载

saagarjhaover 5 years ago

Why do the sample counts cluster so heavily on the jne, as opposed to the other instructions in the loop?

评论 #21310406 未加载

评论 #21310749 未加载

alexeizover 5 years ago

"loop running in the system process while holding a vital NTFS lock"It's not about the seven instructions. It's the lock that's been held while doing a busy loop.

peter_d_shermanover 5 years ago

Excerpt: "...I mean, how often do you have one thread spinning for several seconds in a seven-instruction loop while holding a lock that stops sixty-three other processors from running. That’s just awesome, in a horrible sort of way."I respectfully disagree.That's because everything in the universe that is percieved as negative -- turns out to have a positive use-case somewhere, sometime, in some context...In this case, I think the ability for one core to stop 63 other processor cores is purely awesome, because think of the possible use-cases! Debugger comes to mind immediately, but how about another if let's say there are 63 nasty self-resurrecting virus threads running on my PC? What about if you were doing some kind of esoteric OS testing where you needed to return to something like Unix's runlevel 1 (single user), but you'd rather freeze most of the machine (rather than destroying the context of everything else that was previously running?).Oh, here's the best one I can think of -- don't just do a postmortem, everything's dead core dump when something fails -- do a full (frozen!) "live" dump of a system that can be replayed infinitely, from that state!Now, because I take a contradictory position, doesn't mean we're not friends, or that I don't acknowledge your technical brilliance! Your article was absolutely great, and you are absolutely correct that for your use-case, "That’s just awesome, in a horrible sort of way.".But for my use-cases, it's absolutely awsome, in the most awesome sort of way! <g>

评论 #21315526 未加载

评论 #21318029 未加载

评论 #21316401 未加载

snakover 5 years ago

That was a good read. In-depth but understandable. Thanks for sharing.

CawCawCawover 5 years ago

These posts by Dawson are always interesting. Now, if only he would investigate and remediate the performance deficiencies of other complex systems, such as ... Chrome?

mehrdadnover 5 years ago

Edit: Never mind... I completely missed the word "empty" when reading the critical sentence. :(

评论 #21310419 未加载

Syzygiesover 5 years ago

So when did you first realize he was discussing Windows, reading this?The "of course everyone is a straight white male" attitude that the OS need not be stated, so often seen in Windows posts, gave it away for me. However, my biases threw me for way too long: the level of sophistication meant this must be Linux, right? I should have recognized the graphics style in the screen grabs. Certainly not MacOS, but Linux can be all over the map stylistically. Does Windows really still look like that? Wow.

评论 #21329556 未加载

ncmncmover 5 years ago

The cause is obvious: they were building on Microsoft Windows, using the NTFS filesystem. Even Microsoft doesn't try to build on NTFS.Changing any single detail gives better results. Use a Samba share from a Linux filesystem. Run Mingw on a Linux system. Run MSVS in Wine on a Linux system.Windows is an execution environment for applications. There is no need for, and no value in, actually performing builds in your target execution environment. Use a system designed from the ground up for builds.

评论 #21313520 未加载

评论 #21313449 未加载

评论 #21311613 未加载