This analysis of steal time is not entirely correct.<p>Steal time exists to fix a problem. When a hypervisor needs to pre-empt a <i>running</i> guest, without steal time, when the hypervisor eventually resumes that guest, as far as the guest can tell, the process that was running when the whole guest was pre-empted had run the entire time.<p>This means that if a guest is pre-empted, then CPU usage reporting in the guest becomes horribly wrong with some processes having much higher reported usage than they actually got. This affects fairness and can cause lots of bad things.<p>Steal time is simply a way to tell a guest that it was pre-empted. The guest OS can then use that information to correct its usage information and preserve fairness.<p>However, it is not a general indication of overcommit. When a guest idles a VCPU, that VCPU will be put on the scheduler queue. It may receive an event that would normally cause it to awaken the VCPU however if the system is overcommitted, it may take much longer for the VCPU to be woken up.<p>Most clouds are designed to allow multiple VCPUs per physical CPU too and there certainly is capping in place. You can still see steal time even though you are getting your full share.<p>Let me give an example:<p>1) You are capped at 50%. You run for your full 50%, go idle, the hypervisor realizes you've exhausted your slice, and doesn't schedule you until the next slice. No steal time is reported.<p>2) You are capped at 50%. You have a neighbor attempting to use his full time slice. Instead of getting to run for the first half of your slice with the neighbor running for the second half, the hypervisor carves up the slice into 10 slots and schedules you both in alternative slots. Both guests see 50% steal time.<p>You will get the same performance in both scenarios even though the steal time is reported differently.
Closely related to CPU Steal time is memory ballooning. If an instance is starting to require a lot of memory, and other instances are not, hypervisors (particularly vmware) will steal memory from other VMs on the same machine and give it to the misbehaving VM.<p>This can result in swapping on the unfortunate target VMs.<p>You can detect it by seeing a vmware program running using a lot of CPU (ironically no memory), and by watching your free memory percentage decrease while your programs are not actually consuming more memory.
Using micro instances I've seen it go up to 99% during CPU intensive work (ex: app build). The hang ups waiting for it continue made me decide to switch the build server to an m1.small instance instead. It's idle the vast majority of the time but the extra $$ for it is totally worth it when you're running a build.<p>The steal % is usually zero on the m1.small instance. I just tried maxing out the cpu and watching "top" this is as high as it got:<p><pre><code> Cpu(s): 5.3%us, 39.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 55.7%st
</code></pre>
To max out the cpu I ran the following in a separate ssh terminal while watching "top". An m1.small only has a single v-cpu so only a single running copy should be necessary.<p><pre><code> while :; do date > /dev/null ; done</code></pre>
I've been having issues with steal time recently, but what I'm seeing isn't adequately explained by any of the articles and documentation I could find. Here is an example from one EC2 node:<p><pre><code> 19:26:19 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
06:04:05 PM all 14.85 0.00 5.94 0.00 0.00 0.00 14.85 0.00 64.36
</code></pre>
For me, when the machine was under load, %steal was almost always very close to %usr. It wasn't always the same, sometimes more and sometimes less. Can anyone explain how these numbers are related to each other?
When I ran into this issue in EC2, it was mitigated by leaving cpu0 relatively idle.<p>All apache processes were marked as taskset -c 1-7, the cpu steal and system load went down massively once that was in place.
I use Munin on my VPSs, and it shows the steal time which is nice. I don't typically see it showing up other than a 1 pixel line on RamNode. Hopefully that doesn't change under higher loads in the future.
It's been a while since I used AWS, so pardon me if the question is silly, but:<p>Is it really cost-effective to track metrics like steal time, instead of using a large instance and having the host machine for yourself?