> Every few seconds one of the writes takes forever [~5s]. You can notice the long periods of inactivity, and after that a green dot at the right of the chart: that’s our slow call. What is likely happening is: the local cache saturates and when that happens the application has to wait until the local data is pushed to the remote volume. Boy, you sure don’t want one of your critical code paths to hit one of these slow calls.<p>I'm surprised that there's no asynchronous way that the FS cache will flush itself i.e. when it reaches 50% capacity, and rate-limit incoming requests if it's too full. The idea that an FS cache is so dumb that it can't do <i>anything</i> while it's flushing its entire self is a bit scary - I'd expect that circular buffers and granular locking mechanisms could be used to great effect here. Is this kernel code? Userspace code? Is there research into this? Fundamental tradeoffs that I'm missing?
That's clever and well executed. Wrong palette though :P<p>Red implies problems, green implies "normality", but here this association is misplaced. Perhaps a typical "fire" palette would be better - from dark brown to red to orange to yellow and, ultimately, to white for the extremes.
Neat! This is definitely a step forward -- and thanks for the shout-out to our (that is, Sun's and Joyent's) prior work here. Tempted to also incorporate this into agghist and aggpack, the new DTrace actions I added for this kind of functionality.[1] Anyway, good stuff -- it's always good to see new visualizations of system behavior!<p>[1] <a href="http://dtrace.org/blogs/bmc/2013/11/10/agghist-aggzoom-and-aggpack/" rel="nofollow">http://dtrace.org/blogs/bmc/2013/11/10/agghist-aggzoom-and-a...</a>
It would be interesting to run these tests on different instance sizes, specifically for data on the instance store. The larger the instance, the fewer neighbors you have to worry spending those precious IOPS.<p>As for SSD vs Magnetic EBS, I can't say that I'm surprised. I'd assume that EBS implements some sort of cache in between you and your actual disk on the other side of the network so that the writes can return even faster. Try doing this again with reads and I'd bet you'd get some interesting results.<p>Edit: Also, did you pre-warm your EBS volumes? <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-prewarm.html" rel="nofollow">http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-prewa...</a>
In the world of IOPS provisioned iops application demanding faster and faster iops this tool is handy for devops guy to find the truth of iops being used and how its performing, selecting if there is need to upgrade the storage ..