Impressive. Easy to get going, low overhead, powerful one-liners.<p>I like the filter syntax - would be nice for perf_events to pick this up. Although, if it did, I hope that the stable filter fields API can be extended with unstable arbitrary expressions as needed, for when dynamic probes are used.<p>What perf_events realy lacks is a way for custom processing of data in kernel context, to reduce the overheads of enablings. Eg, lets say I want a histogram of disk I/O latency. sysdig has chisels, which look like they do what I want, but from the Chisels User Guide: "Usually, with dtrace-like tools you write your scripts using a domain-specific language that gets compiled into bytecode and injected in the kernel. Draios uses a different approach: events are efficiently brought to user-level, enriched with context, and then scripts can be applied to them." Oh no, not user-level!<p>I tested this quickly, expecting DTrace's approach (which is the same as SystemTap and ktap) to blow sysdig out of the water. But the results were surprising (take these quick tests with a grain of salt). Here's my target command, along with sysdig and DTrace enablings, and strace for comparison:<p><pre><code> Target: dd if=/dev/zero of=/dev/null bs=1k count=1000k
sysdig: sysdig -c topfiles_bytes
DTrace: dtrace -n 'syscall:::entry /execname == "dd"/ { @[probefunc] = count(); }'
strace: strace -c dd ...
</code></pre>
sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace (for comparison), over 200x. This is a worst-case test, and if I'm willing to slow a target by 2x then taking that to 4x doesn't make much difference. With what I normally trace, the overheads are 1/100th of that, so DTrace is negligible. The take-away here is that the overheads are closer to the "negligible" end of the spectrum than strace's "violent" end. Which I found surprising for user-level aggregation.<p>The Sysdig Examples could do with some sanity checking. Eg:<p>"See the top processes in terms of disk bandwidth usage
sysdig -c topprocs_file"<p>I saw:<p><pre><code> Bytes Process
------------------------------
134.65M dd
4.82KB snmp-pass
603B snmpd
332B sshd
220B bash
107B sysdig
</code></pre>
That's while my dd between /dev/zero and /dev/null was running. No "disk bandwidth"! :)<p>edit: formatting
I had the privilege of early access to sysdig thanks to the developers. It's not as powerful as SystemTap or DTrace but it is very useful and easy to use. Think of it as strace(8) with global dump capability (not just per-process), more powerful filters, replayable logging à la tcpdump(8), and Lua plugin support.<p>Plus the packaging is top-notch; its kernel modules are rebuilt automatically on kernel upgrade via DKMS (which I wish other vendors like FusionIO would do).
I like that you link to the github, where the README is a link to your more-slick website, which has nothing but a couple of examples and an install page, all of which is really linkbait for your company Draios. It almost seemed like you were just sharing a useful tool. The tool might be really useful, but at this point i'm still clicking through links trying to figure out what it does and how.<p>edit: Nevermind, I found it. It's a kernel module and user app that uses Lua scripts for interpreting data. Sorry about my harsh tone before, but jesus I hate it when there's more gloss than content.
I feel like some introductory article about the different instrumentation facilities available for Linux systems would be welcome. Just checking wikipedia and google, I found the following items: SystemTap, Dprobes, LTTng, DTrace, strace, ltrace (and latrace), ktap, utrace, ftrace, kprobes, jprobes. And now we have sysdig too.
Looks very useful. Some things you can do with it:<p>Dump system activity to file, so that sysdig can be used to process it later.<p>* sysdig -w trace.scap<p>Print process name and connection details for each incoming connection not served by apache.<p>* sysdig -p "%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"<p>See the files where apache spends the most
time doing I/O.<p>* sysdig -c topfiles_time proc.name=httpd<p>Show the network data that apache exchanged
with 192.168.0.1.<p>* sysdig -A -c echo_fds fd.sip=192.168.0.1 and proc.name=httpd<p>Show every time a file is opened under /etc.<p>* sysdig evt.type=open and fd.name contains /etc
I would like to know what's going more low level, Ktap gives a good break down how they differ from SystemTap, dynamically typed, byte-code design... etc<p><a href="http://www.ktap.org/doc/tutorial.html#faq" rel="nofollow">http://www.ktap.org/doc/tutorial.html#faq</a><p>Is Sysdig design similar?
"The definitive tool" they name it, yet its not as powerful as dtrace.
So, its not definitive.<p>Looks nice otherwise. Too bad it needs a kernel module.
Ah, the good ol' pipe through sudo bash installation instructions. I wish there was a more structured platform independent way of distributing stuff before the stuff is packaged by distros.
Given that it involves a kernel module, I was kind of skeptical- but Greg KH seems to have looked it over and fixed it up, which I'd call a compelling seal of approval:<p><a href="https://github.com/draios/sysdig/commits/master/driver" rel="nofollow">https://github.com/draios/sysdig/commits/master/driver</a>
This tool is very similar to what I had created last summer as an intern (strace/lsof analysis), but it seems to be a lot more rich in features. I analyzed system calls as well as application tracing (New Relic) to find/fix performance bottlenecks.
I am getting error during compiling on Arch linux:<p><a href="https://github.com/draios/sysdig/issues/39" rel="nofollow">https://github.com/draios/sysdig/issues/39</a><p>Has anyone encounter with this error before? Any help would be appreciated.
After installing sysdig, when I trying to run it I get the following error:<p># sysdig fd.type=ipv4<p>error creating the process list<p>Has anyone seen this one before? Any help would be appreciated.