What should the CPU usage be of a fully-loaded CPU that has been throttled?

190 点作者 pathompong将近 4 年前

51 条评论

magila将近 4 年前

A key problem with this proposal is that modern CPUs do not have a single definitive maximum frequency. You have the base frequency which is rarely relevant outside of synthetic workloads then you have a variety of turbo frequencies which interact in complex ways. AMD's latest CPUs don't even have clear upper bounds on their frequency scaling logic. It's a big black box and the results can vary depending on the workload, temperature, silicon quality, phase of moon, etc.

评论 #27726746 未加载

评论 #27726847 未加载

评论 #27725857 未加载

评论 #27725524 未加载

评论 #27727345 未加载

评论 #27726824 未加载

wtallis将近 4 年前

When the OS has asked the CPU to slow down to more closely match the performance currently required by the software, then it is somewhat misleading to report that an application is using 90+% of the CPU time, even if the CPU is actually spending 90+% of its time running that application.However, when the CPU's speed has been reduced because it's too hot or the system is otherwise unable to allow the processor to sustain its full clock speed, you absolutely should see the Task Manager reporting 100% utilization. This scenario is what more often comes to mind when the term "throttled" is used.If a hardware platform's power management capabilities make it impractical for the operating system to satisfy both of the above goals, then it should favor the latter goal, and err on the side of not lying to the user when your system is truly operating at its limits.

评论 #27726683 未加载

MauranKilom将近 4 年前

Fun tangential anecdote regarding how interconnected and unintuitive CPU performance can be: I once made something run 20% faster by spawning a thread that did nothing but spin (i.e. while (true);).I was trying to optimize some FEM code, toying with (hardcoded) solver parameters. On one console I had it spitting out the wall clock durations of time steps as the simulation was running, while on the other I was preparing the next run. I start compiling another version, and inexplicably the simulation in the other console gets faster. Like, 10%-20% less time taken per time step. "That must have been coincidence. There's no way the simulation got faster by compiling something in parallel." But curiosity got the better of me and I still investigated.Watching the CPU speed with CPU-Z, it turned out that the simulation was indeed getting down-clocked, and that compiling something in parallel made the CPU run faster, speeding up the simulation too. WTF? And indeed, I could make the entire simulation run significantly faster by calling<pre><code> std::thread([](){ while (true); }); </code></pre> at the start of main.Why? Well, the simulation happens to be extremely memory-bound (sparse mat-vec multiplication in inner loop). So the CPU is mostly waiting around for data to arrive. Apparently the CPU downclocks as a result. That would be fine, if not for the fact that the uncore/memory subsystem clock speed is directly tied to the current CPU speed. That's right: The program was memory-bound, hence the CPU clocked down, hence the uncore clocked down, hence memory accesses became slower.Knowing that feedback loop, it makes perfect sense that keeping the CPU busy with a spinning thread improves performance. But it's still one big wtf.This problem eventually went away as we parallelized more and more of the simulation, giving the CPU less reason to clock down. But for related reasons, the simulation still runs faster if you prevent hyperthreading (either by disabling it in BIOS or having num threads = num hardware cores). More threads don't improve memory bandwidth and the hyperthread pairs just step on each others toes.

评论 #27735810 未加载

wmf将近 4 年前

Power saving and "throttling" (usually it's un-turboing) are different cases that shouldn't be conflated; in one case the processor could run faster but in the other it can't. Ultimately we may want different metrics depending on what they're going to be used for. If you calculate relative to base frequency you will get utilization over 100% which is going to confuse some people.Linux has done some work in this area with frequency-invariant utilization tracking: <a href="https://lwn.net/Articles/816388/" rel="nofollow">https://lwn.net/Articles/816388/</a>

评论 #27726756 未加载

annoyingnoob将近 4 年前

I think we can have more nuance than a choice of 50% or 100%. I fall in the 'Show 100%' camp for sure, showing 50% but not indicating why isn't particularly helpful without knowing the background here. I think a separate indicator for an overall throttling condition would be helpful. Show a 100% usage and a throttle indicator together.

评论 #27726671 未加载

评论 #27726522 未加载

评论 #27728103 未加载

评论 #27725643 未加载

评论 #27727569 未加载

SavantIdiot将近 4 年前

This is precisely why all of the large volume cloud server farms I worked with turn off throttling: they need 100% predictable CPU utilization. I worked on power control strategies at Intel for quite some time, and we would often joke in server (Xeon) parts that it was pointless because all of our work was disabled.Early throttles were 50% duty cycles, then L1 bubble injections, then V/F frequency scaling. The author only addresses the early mechanisms, but it gets even more complex with the PCUs in the later Xeons.It is not an easy question to answer, but I think the question can be modified. A single number doesn't solve the problem, you need to know utilization in the context of throttling (and magnitude). Then decide what you are trying to solve: scheduling or app-level throttling? Personally, the Hz denominator should change if it used in utilization , since that covers the majority of cases. Any other case should read both metrics.EDIT: Removed generalization and worded as anecdote.

评论 #27725818 未加载

dataflow将近 4 年前

> half-speed for whatever reasonIMHO the reason actually does matter.Utilization should be relative to the maximum frequency the CPU could be running the same instructions at. If the CPU is throttling to save power, then it could be running the same instructions at a higher frequency, so utilization should be relative to the higher one. If it's throttling to lower the temperature, then it can't be running the same instructions at a higher frequency, so it's already maxed out at 100%.

评论 #27725850 未加载

评论 #27726667 未加载

评论 #27725838 未加载

RandomBK将近 4 年前

Another approach may be to take inspiration from the `cpu load` metric on *nix systems and go _above_ 100%. In this example, the CPU usage would be `200%`: The system would like to be doing twice as much as it's currently doing, but something's throttling it.Of course, this opens up other issues with how to aggregate multiple cores, what the benchmark for 'max' should be, etc.Perhaps the more fundamental answer is that there's no single metric that can sum up the situation for all use cases, in which case displaying '100%' would be more useful for a typical consumer while exposing multiple detailed metrics would be more useful for system admins and power users.

评论 #27727059 未加载

评论 #27727106 未加载

userbinator将近 4 年前

As far as I know, "CPU usage" of a process has always been "percentage of time spent in it" so I think anything else is overthinking/overcomplicating things. It makes perfect sense for CPU usage to go up if the frequency starts falling due to throttling. That's why a separate indication of current CPU frequency is necessary.

评论 #27725784 未加载

webkike将近 4 年前

If you say “there are two perspectives on reporting this fact”, what you really mean to say is “there are two things to report here”

评论 #27728357 未加载

formerly_proven将近 4 年前

"CPU usage" as in "how many time-slices did this process eat?" is pretty easy to understand and generally points at the right things ("What's eating all that CPU?", "Is this application using the correct number of threads?" etc.)Trying to express "How much of the hypothetically available computational resources of the CPU did this application consume?" in a single number would seem like a futile exercise at best. VTune used to have something like this which IIRC was based on using all cores in parallel sections and IPC or something like that. It wasn't very meaningful, and is impacted by all sorts of factors.

spullara将近 4 年前

Related, 50% CPU usage on a hyper-threaded CPU isn't 50%. It is usually closer to 80-90% depending on the workload. Something to watch out for when monitoring.

评论 #27726539 未加载

评论 #27725513 未加载

kijin将近 4 年前

Linux VMs have the concept of "steal". It represents CPU cycles that were supposed to be available, but were taken away by the hypervisor for various reasons. Steal appears in CPU usage stats alongside other types of wasted cycles such as interrupt handling and I/O wait.Perhaps that's something Microsoft can borrow and improve upon.

wonnage将近 4 年前

I think Apple's approach of attributing the capacity not available due to throttling (or various other reasons) to kernel_task might be the best. You can tell something is eating cpu, stuff isn't just stuck at 50% while the rest of the system looks idle.Although then you have users trying to figure out how to kill kernel_task, which isn't great either...

评论 #27725450 未加载

评论 #27725507 未加载

评论 #27725448 未加载

评论 #27729297 未加载

jeffbee将近 4 年前

I don't like to speak about "throttling" because all modern CPUs are in a closed-loop control system where the capacity of one core-second varies. There is no question about whether your CPU is throttled. It is, always. That leads to all the uncertainty about the denominator. We know how many cycles passed while a certain thread had the CPU, but we don't have very good ways to estimate the number of cycles that were available. If you take the analysis one layer deeper, does a program that waits on main memory while chasing pointers randomly use 100% of the CPU, or does it waste 99% of it, since it's not using most of the execution resources? Such a program could be said to be using 100% CPU time, but it won't respond to higher CPU clock speeds. When waiting for loads it makes no difference if time passes at 4GHz or 400MHz.So anyway, it is complicated.

bluedino将近 4 年前

Reporting the clockspeed along with the used percentage makes the most sense.“I’m using 100% of my cpu but I’m only at 1.2GHz, something is wrong”

woofie11将近 4 年前

It's the testing problem: Trying to report squish too many numbers into one.The right reporting is to report both the numerator and the denominator. I'm using 800 MIPS out of 1200 MIPS available.

评论 #27726540 未加载

matco11将近 4 年前

Perhaps, the system should report two numbers: the % usage of the CPU overall resources and the % usage of the throttled resources.There is no constraint that one has to use only one number. Perhaps, in 2021, CPUs are complex enough to deserve more than one number to give a useful representation of utilization, as it works with all sorts of factories/plants. Also, who would love to see in the metrics reported CPU usage per core?

评论 #27728179 未加载

tedunangst将近 4 年前

What if my program has a large working set and the CPU spends 50% of its cycles waiting for memory fetches to complete? What is its CPU usage?

temac将近 4 年前

It only makes sense to report e.g. 50% in case of throttling if you also report e.g. 130% in case of boosting (on a single core).Which could be useful.Now throw HT in the mix and loose your mind.I'm not sure there is a really better solution. Just document the one you choose, please!

yuliyp将近 4 年前

I don't think this makes much sense. The maximum throughput of a CPU in terms of instructions depends on so many factors (thermal, memory, instruction mix) that trying to summarize all of them into a linearly-scaling "utilization" metric is a bit tricky. You can measure the fraction of time CPU cores are busy, the number of instructions executed, frequencies, etc. to get an idea, but only experimentation will tell you what "100%" of a system's capacity is.

eyesee将近 4 年前

Maybe there should be a separate indication of throttling so we don’t conflate with CPU usage.Throttling is essentially determined by three factors: Energy usage, thermal saturation, and cooling. A bath tub analogy comes to mind: Energy usage represented by flow from a faucet, the heat sink is the water level in the tub, and the drain represents the cooling rate. Only energy usage could be plotted instantaneously while the others may have to be modeled and could change based on environmental factors.

barrkel将近 4 年前

Throttling should show up as a pseudo-process, "consuming" peak performance.

评论 #27729470 未加载

moonchild将近 4 年前

Here's another problem: what if a program is i/o-bottlenecked, and taking up 50% of the CPU's cycles. Because the CPU utilization is not 100%, it clocks down to 50% of its maximum clock rate, so now the program is taking up 100% of the CPU's cycles. This isn't throttling, it's just regular power management.Clearly that's a different kind of situation; how do you distinguish the two?

lrizzo将近 4 年前

I think there are two different use cases:- measurement at thread level (top and the like) should report instructions retired per unit of time, not %CPU. We want to know how much work is being done. The explanation on why so much/so little requires more information (scheduler? cache misses? cpu throttling?)- measurement at the CPU level (mpstat, etc) should report both %CPU (as "time active" vs "wall clock time") and active clock cycles per unit of time (dividing by clock stretching factor if used, and perhaps scaled to absolute max frequency and/or %CPU if one wants a percentage).%CPU tells us whether we are making full use of the available time, and perhaps suggests to schedule threads differently if appropriateClock cycles tell us how much one CPU is affected by throttling (manual, automatic, due to C-state exits, etc), and this is an orthogonal indication on whether there is something at the system level that is making the CPU underutilized

mercora将近 4 年前

> Another theory is that this should report 50% CPU usage, because even though that CPU-intensive program is causing the CPU to consume all of its available cycles, it is not consuming all off the cycles that are potentially available.if they would be available wouldnt the CPU scaling kickin and ramp up the frequency as required? i think in this case it should really show up as 50%...otherwise its somewhat similar to battery charge status... apparently nobody wants to know what percentage of the initial full capacity is left but just how much is left from what is potentially available... so in a low power state were throttling is done permanently at some frequency i would just like to know how much capacity is actually left...

评论 #27725427 未加载

notacoward将近 4 年前

I remember having debates in 1990, when SMP UNIX was still a new thing, about whether "load average" should be scaled by the number of processors. Things have only gotten messier since then. As usual, Brendan Gregg has a good take.<a href="https://twitter.com/brendangregg/status/1411654427304333313" rel="nofollow">https://twitter.com/brendangregg/status/1411654427304333313</a>Personally, what I'd want to see is the proportion of max achievable IPC, and if that means getting used to numbers well below 100% (even when I'm doing everything right) then so be it. I can adjust my expectations and targets.

eevilspock将近 4 年前

Maybe ditch relative (50%) for absolute (Hz)?That's what we do for other things that have no fixed upper bound, e.g. disk/network i/o. We even do this for memory now given dynamic virtual memory, etc.

Const-me将近 4 年前

> “dynamic frequency scaling”, a feature that allows software to instruct the CPU to run at a lower speed, commonly known as “CPU throttling”AFAIK that's not how it works in modern CPUs. Dynamic frequency scaling is a black box implemented in hardware. Software, and even OS kernels, have very little input over that. They can't subscribe for status updates. Even just getting the current frequency seems impossible, only indirectly by comparing RDTSC output (absolute time unscaled) with performance counters.

ksec将近 4 年前

I talked about TDP Computing a lot. We are basically limited by Cooling of a product designed.How about another measure, % of CPU TDP? Or some form of TDP measuring. If it is 100% CPU TDP, I know it is pushing as hard as it can.( But then when your cooling aged you will be running at 100% TDP but at lower clock speed without realising it. )Thinking about it this simple subject is really complicated.

wruza将近 4 年前

I’d add “Energy saving / throttling” process and accounted it in another color on the graph (green vs blue), like in “world energy use over” google search. This way you’d have 50% green-busy CPU, and the other 50% available. Unthrottling would reduce green to lesser values, and 100% would always remain constant in absolute value.

sfisthemoon将近 4 年前

I’d love to see how many watts my process is using. After all, energy is the resource each thread is really consuming.

评论 #27727116 未加载

he0001将近 4 年前

I don’t think the article’s solution is a good one. The CPU metric is on its own really hard to understand and complex. I’d rather have the OS to understand and report when the CPU is throttled and do the reporting accordingly. So the metrics are easier to interpret.

adrr将近 4 年前

Shouldn’t there be another metric that indicates what the capacity of the CPU is running at? It would be a important metric to monitor to detect issues like thermal based throttling or how often the CPU is going is utilizing boost features.

nprateem将近 4 年前

What? An African or European CPU?

tumblewit将近 4 年前

If you show 50% usage then there are going to be plenty of customer service calls where customers complain about poor performance of the PC and then laptops will be compared based on this number by those that are not tech savvy …

ineedasername将近 4 年前

Why not show the absolute % CPU usage and the % relative to the throttled capacity? No need to choose. For that matter, if throttled, also indicate the reason: heat, performance (any other reasons in might be?)

tgtweak将近 4 年前

The other consideration is that you want to encourage application developers to use those lower power states, and not make it look like their program is artificially abusing available resources.

CodeWriter23将近 4 年前

I think this article asks the wrong question. The correct question IMO is why is the frequency throttled instead of maxed out when a program is saturating the CPU?

评论 #27727843 未加载

dannyw将近 4 年前

On OSX, I recall there's a fake "throttled" daemon that reports the CPU usage lost due to thermal throttling.Name is probably wrong but it definitely exists.

评论 #27728196 未加载

NGRhodes将近 4 年前

All that matters to me is what % CPU is used compared to what is the maximum available on demand to a user at a given point in time.

randyrand将近 4 年前

If you want that approach, just add a line that says "thermal throttling: 50%" at the top. Then it still adds to 100% usage.

Neil44将近 4 年前

I disagree. Seeing 100% CPU but a low speed in windows task manager is a strong easy signal that you need to look at thermal issues. Taking away that information makes troubleshooting harder. It is also ‘truth’ in that moment and adjusting it to some other value based on what the cpu could do (but isnt doing right now) only serves to obfuscate imo.

johnklos将近 4 年前

It makes me sad that people at Microsoft don't seem to learn from history, because we have precedent that makes perfect sense: report percent (or fractions of 1) between work and what'd be 100% at that moment.

rasz将近 4 年前

reporting 50% utilization while CPU is heavily throttling sure makes Microsoft strategic partner Intel happy, a total coincident I imagine

hownottowrite将近 4 年前

African or European?

exabrial将近 4 年前

It should bea bar chart with the entire bar being the potential, and a red area showing the throttling as a percent. Throttling + actual execution total to 100%

6510将近 4 年前

% is just not the right unit.

catern将近 4 年前

Whichever one is easier to implement. Worse is better.

jMyles将近 4 年前

I agree with the author. His thesis:> While I sympathize with this point of view, I feel that reporting the CPU usage at 50% is a more accurate representation of the situation.

评论 #27725849 未加载

toxik将近 4 年前

Maldebrot set? Just look the word up my dude.

评论 #27725508 未加载

评论 #27725611 未加载