Show HN: Sysdig, a tool for Linux system exploration

171 pointsby degioabout 11 years ago

16 comments

brendangreggabout 11 years ago

Impressive. Easy to get going, low overhead, powerful one-liners.I like the filter syntax - would be nice for perf_events to pick this up. Although, if it did, I hope that the stable filter fields API can be extended with unstable arbitrary expressions as needed, for when dynamic probes are used.What perf_events realy lacks is a way for custom processing of data in kernel context, to reduce the overheads of enablings. Eg, lets say I want a histogram of disk I/O latency. sysdig has chisels, which look like they do what I want, but from the Chisels User Guide: "Usually, with dtrace-like tools you write your scripts using a domain-specific language that gets compiled into bytecode and injected in the kernel. Draios uses a different approach: events are efficiently brought to user-level, enriched with context, and then scripts can be applied to them." Oh no, not user-level!I tested this quickly, expecting DTrace's approach (which is the same as SystemTap and ktap) to blow sysdig out of the water. But the results were surprising (take these quick tests with a grain of salt). Here's my target command, along with sysdig and DTrace enablings, and strace for comparison:<pre><code> Target: dd if=/dev/zero of=/dev/null bs=1k count=1000k sysdig: sysdig -c topfiles_bytes DTrace: dtrace -n 'syscall:::entry /execname == "dd"/ { @[probefunc] = count(); }' strace: strace -c dd ... </code></pre> sysdig slowed the target by about 4x. DTrace, between 2.5 and 2.7x. strace (for comparison), over 200x. This is a worst-case test, and if I'm willing to slow a target by 2x then taking that to 4x doesn't make much difference. With what I normally trace, the overheads are 1/100th of that, so DTrace is negligible. The take-away here is that the overheads are closer to the "negligible" end of the spectrum than strace's "violent" end. Which I found surprising for user-level aggregation.The Sysdig Examples could do with some sanity checking. Eg:"See the top processes in terms of disk bandwidth usage sysdig -c topprocs_file"I saw:<pre><code> Bytes Process ------------------------------ 134.65M dd 4.82KB snmp-pass 603B snmpd 332B sshd 220B bash 107B sysdig </code></pre> That's while my dd between /dev/zero and /dev/null was running. No "disk bandwidth"! :)edit: formatting

评论 #7526275 未加载

评论 #7532930 未加载

otterleyabout 11 years ago

I had the privilege of early access to sysdig thanks to the developers. It's not as powerful as SystemTap or DTrace but it is very useful and easy to use. Think of it as strace(8) with global dump capability (not just per-process), more powerful filters, replayable logging à la tcpdump(8), and Lua plugin support.Plus the packaging is top-notch; its kernel modules are rebuilt automatically on kernel upgrade via DKMS (which I wish other vendors like FusionIO would do).

peterwwillisabout 11 years ago

I like that you link to the github, where the README is a link to your more-slick website, which has nothing but a couple of examples and an install page, all of which is really linkbait for your company Draios. It almost seemed like you were just sharing a useful tool. The tool might be really useful, but at this point i'm still clicking through links trying to figure out what it does and how.edit: Nevermind, I found it. It's a kernel module and user app that uses Lua scripts for interpreting data. Sorry about my harsh tone before, but jesus I hate it when there's more gloss than content.

评论 #7524792 未加载

zokierabout 11 years ago

I feel like some introductory article about the different instrumentation facilities available for Linux systems would be welcome. Just checking wikipedia and google, I found the following items: SystemTap, Dprobes, LTTng, DTrace, strace, ltrace (and latrace), ktap, utrace, ftrace, kprobes, jprobes. And now we have sysdig too.

评论 #7531947 未加载

shubbabout 11 years ago

Looks very useful. Some things you can do with it:Dump system activity to file, so that sysdig can be used to process it later.* sysdig -w trace.scapPrint process name and connection details for each incoming connection not served by apache.* sysdig -p "%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"See the files where apache spends the most time doing I/O.* sysdig -c topfiles_time proc.name=httpdShow the network data that apache exchanged with 192.168.0.1.* sysdig -A -c echo_fds fd.sip=192.168.0.1 and proc.name=httpdShow every time a file is opened under /etc.* sysdig evt.type=open and fd.name contains /etc

评论 #7524279 未加载

joshbaptisteabout 11 years ago

I would like to know what's going more low level, Ktap gives a good break down how they differ from SystemTap, dynamically typed, byte-code design... etc<a href="http://www.ktap.org/doc/tutorial.html#faq" rel="nofollow">http://www.ktap.org/doc/tutorial.html#faq</a>Is Sysdig design similar?

评论 #7526033 未加载

zobzuabout 11 years ago

"The definitive tool" they name it, yet its not as powerful as dtrace. So, its not definitive.Looks nice otherwise. Too bad it needs a kernel module.

评论 #7525493 未加载

评论 #7525260 未加载

yxhuvudabout 11 years ago

Ah, the good ol' pipe through sudo bash installation instructions. I wish there was a more structured platform independent way of distributing stuff before the stuff is packaged by distros.

评论 #7526052 未加载

评论 #7524602 未加载

评论 #7525106 未加载

simonebrunozziabout 11 years ago

Wow, this is really great. From the creator of Wireshark, nonetheless :)

评论 #7527858 未加载

krakensdenabout 11 years ago

Given that it involves a kernel module, I was kind of skeptical- but Greg KH seems to have looked it over and fixed it up, which I'd call a compelling seal of approval:<a href="https://github.com/draios/sysdig/commits/master/driver" rel="nofollow">https://github.com/draios/sysdig/commits/master/driver</a>

perryh2about 11 years ago

This tool is very similar to what I had created last summer as an intern (strace/lsof analysis), but it seems to be a lot more rich in features. I analyzed system calls as well as application tracing (New Relic) to find/fix performance bottlenecks.

评论 #7528027 未加载

mesuuttabout 11 years ago

I am getting error during compiling on Arch linux:<a href="https://github.com/draios/sysdig/issues/39" rel="nofollow">https://github.com/draios/sysdig/issues/39</a>Has anyone encounter with this error before? Any help would be appreciated.

neuronsourcingabout 11 years ago

After installing sysdig, when I trying to run it I get the following error:# sysdig fd.type=ipv4error creating the process listHas anyone seen this one before? Any help would be appreciated.

评论 #7531698 未加载

digitalyatriabout 11 years ago

Some observationssudo sysdig -w file1.logfile1.log contains lots of junk characters (fix this) ^@^@^@^@^@^@^@^@^@^@^@^@^Better alternativesudo sysdig > file2.logfile has proper logs

评论 #7529574 未加载

评论 #7529547 未加载

pinturicabout 11 years ago

It is amazing how easy it seams to collect such information with this tool

wesleyacabout 11 years ago

Just looked at the website, and had a very "small world" feeling:They're located in my town O.o