Impressive. Easy to get going, low overhead, powerful one-liners.<p>I like the filter syntax - would be nice for perf_events to pick this up. Although, if it did, I hope that the stable filter fields API can be extended with unstable arbitrary expressions as needed, for when dynamic probes are used.<p>What perf_events realy lacks is a way for custom processing of data in kernel context, to reduce the overheads of enablings. Eg, lets say I want a histogram of disk I/O latency. sysdig has chisels, which look like they do what I want, but from the Chisels User Guide: "Usually, with dtrace-like tools you write your scripts using a domain-specific language that gets compiled into bytecode and injected in the kernel. Draios uses a different approach: events are efficiently brought to user-level, enriched with context, and then scripts can be applied to them." Oh no, not user-level!<p>I tested this quickly, expecting DTrace's approach (which is the same as SystemTap and ktap) to blow sysdig out of the water. But the results were surprising (take these quick tests with a grain of salt). Here's my target command, along with sysdig and DTrace enablings, and strace for comparison:<p><pre><code> Target: dd if=/dev/zero of=/dev/null bs=1k count=1000k
sysdig: sysdig -c topfiles_bytes
DTrace: dtrace -n 'syscall:::entry /execname == "dd"/ { @[probefunc] = count(); }'
strace: strace -c dd ...
</code></pre>
sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace (for comparison), over 200x. This is a worst-case test, and if I'm willing to slow a target by 2x then taking that to 4x doesn't make much difference. With what I normally trace, the overheads are 1/100th of that, so DTrace is negligible. The take-away here is that the overheads are closer to the "negligible" end of the spectrum than strace's "violent" end. Which I found surprising for user-level aggregation.<p>The Sysdig Examples could do with some sanity checking. Eg:<p>"See the top processes in terms of disk bandwidth usage
sysdig -c topprocs_file"<p>I saw:<p><pre><code> Bytes Process
------------------------------
134.65M dd
4.82KB snmp-pass
603B snmpd
332B sshd
220B bash
107B sysdig
</code></pre>
That's while my dd between /dev/zero and /dev/null was running. No "disk bandwidth"! :)<p>edit: formatting