科技回声

13 条评论

nn3超过 9 年前

If you rely on the TCP checksum to protect your data you are delusional anyways. The TCP checksum is a simple additive checksum that is very weak and cannot detect wide classes of data corruptions. It is is well known that it won't catch many problems, with classical papers describing this. All it can do is to catch very obvious "big" mistakes.That is why link layers always have other stronger CRCs too.This is actually one of the better reasons to always use TLS. Yhe MAC authentication it uses is much stronger.

评论 #11087818 未加载

评论 #11091740 未加载

评论 #11088523 未加载

评论 #11088431 未加载

评论 #11089049 未加载

评论 #11088388 未加载

评论 #11088204 未加载

jsolson超过 9 年前

Google Container Engine VMs should be protected from this by the virtual NIC advertised to them. In particular, it advertises support for TCP checksum verification offload which all modern Linux guests negotiate (including the kernel used in GKE). If this feature is negotiated the host-side network (either in hardware or software) verifies the TCP checksum on the packet on behalf of the guest and marks it as having been validated prior to delivering it to the guest.Older Linux kernels have an additional (I believe distinct) veth related bug that requires we do some extra work for externally verified packets (and jumbograms): in particular we must set up the packet we are delivering assuming that it might be routed beyond the guest and that the guest will not remove/reapply the virtio-net header as an intermediate step (this is a pretty leaky abstraction of the Linux virtio-net driver, but one we're aware of and have worked to accommodate).Of course, none of the above changes the somewhat fragile nature of TCP checksums, generally.(note: I wrote the virtio-net NIC we advertise in GCE/GKE, although very little of the underlying dataplane, but I double checked with the GKE team in terms of underlying kernel versions that we typically run).

评论 #11089522 未加载

评论 #11155210 未加载

fred256超过 9 年前

Sounds similar to bug #3 in this article: <a href="https://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/" rel="nofollow">https://www.pagerduty.com/blog/the-discovery-of-apache-zooke...</a>

评论 #11088330 未加载

评论 #11088033 未加载

packetized超过 9 年前

This is a great write up.Also, for those saying that TLS is a panacea: encrypting and/or HMAC'ing all TCP data in and out of a box is operationally ridiculous unless you're in some sort of ultra high security environment.

评论 #11088565 未加载

betaby超过 9 年前

From the link I didn't understand why hardware didn't do check-sum checking. Or in fact it does and only if one using a) veth and b) nic without hardware check-summing is affected?

评论 #11088598 未加载

评论 #11088602 未加载

derFunk超过 9 年前

Is there a tl;dr? Who would be affected? Sounds scary.

评论 #11087639 未加载

评论 #11087403 未加载

评论 #11087646 未加载

felixge超过 9 年前

Here is another Linux TCP horror story: An application I'm working on was experiencing slow database query performance under "load" [1]. Restarting the database temporarily "fixed" the issue, only to reappear again after a short time.Luckily I was able to recreate the problem in a test environment (our secondary backup cluster) allowing me to study it. What I found was that I could reliably send the database cluster in a "bad state" by sending a burst of > ~200 concurrent requests to it. After this, I observed a bi-modal response time distribution with some requests completing quickly as expected (<10ms) and some taking much longer (consistently ~6s for one particular request). My initial instinct was to blame the database, but some SYN Cookie flood warnings in the kernel logs caused me to consider the network as well.So I started using tcpdump and Wireshark to go deeper and found the following: The burst of traffic from the application also caused a burst of traffic between the database cluster nodes which were performing some sort of result merging. To make things worse, the inter-node requests of the database cluster were using http, which meant a lot of connections were created in the process. Some of these connections were interpreted as a SYN flood by the Linux kernel, causing it to SYN-ACK them with SYN cookies. Additionally, these connections would get stuck with very small TCP windows (as low as 53 bytes), and also suffer from really high ACK latencies (200ms), so a 1600 byte inter-node http request wound up taking 6s! Disabling SYN cookies "fixed" the issue (and so did increasing somaxconn, but that's effectively the same), but despite my best effort, I was unable to understand why SYN cookies should impact the TCP window.To make this even more mysterious, this problem only occurred in one of our data centers, and we narrowed it down to the router being the only difference. Replacing the router also "fixed" the issue.I wish my team had the resources and expertise to debug problems like this down to the kernel, but I was too far out of my depth trying to understand the gnarly code the makes up the Linux TCP Syn Cookie and Congestion Control implementation ... : (.Anyway, I'm posting this in the vague hope that somebody may have seen something similar before, or becomes inspired to go on kernel bug hunt :).Additionally this experience gave me a new appreciation for TCP/IP and how amazing it is that is usually "just works" for me as an application developer. This is not to say that we can't improve upon it, but I think there is a lot to learn from the philosophy and approach that went into designing TCP/IP.[1] By "load" I mean bursts of hundreds of concurrent http requests created due the application performing JOIN requests on behalf of the NoSQL database which doesn't provide this feature. My journey of replacing this database with one that's more suited for the task at hand is being written as we speak :). [2] <a href="https://en.wikipedia.org/wiki/SYN_cookies" rel="nofollow">https://en.wikipedia.org/wiki/SYN_cookies</a>

评论 #11090309 未加载

评论 #11089307 未加载

评论 #11090559 未加载

dgpl超过 9 年前

Very interesting, does it affect libvirt / lxc? I wonder what is the frequency of this problem.

评论 #11089580 未加载

评论 #11094244 未加载

outworlder超过 9 年前

Shouldn't this affect anything that uses veths, not just containers? Such as Openstack.

评论 #11089561 未加载

pmontra超过 9 年前

Ubuntu 12.04 LTS (still 14 months to go) with the latest Hardware Enablement Stack (12.04.5) runs on the 3.13 kernel. I hope that Canonical will backport the fix to 3.13 and not only to 3.14 as hinted by the article.

评论 #11088024 未加载

评论 #11088799 未加载

doggydogs94超过 9 年前

I like the "goto drop" statement.

评论 #11096676 未加载

sz4kerto超过 9 年前

Does this affect Docker overlay networks?

bobinator606超过 9 年前

use romana.io instead of veth

13 条评论

nn3超过 9 年前

评论 #11087818 未加载

评论 #11091740 未加载

评论 #11088523 未加载

评论 #11088431 未加载

评论 #11089049 未加载

评论 #11088388 未加载

评论 #11088204 未加载

jsolson超过 9 年前

评论 #11089522 未加载

评论 #11155210 未加载

fred256超过 9 年前

评论 #11088330 未加载

评论 #11088033 未加载

packetized超过 9 年前

评论 #11088565 未加载

betaby超过 9 年前

From the link I didn't understand why hardware didn't do check-sum checking. Or in fact it does and only if one using a) veth and b) nic without hardware check-summing is affected?

评论 #11088598 未加载

评论 #11088602 未加载

derFunk超过 9 年前

Is there a tl;dr? Who would be affected? Sounds scary.

评论 #11087639 未加载

评论 #11087403 未加载

评论 #11087646 未加载

felixge超过 9 年前

评论 #11090309 未加载

评论 #11089307 未加载

评论 #11090559 未加载

dgpl超过 9 年前

Very interesting, does it affect libvirt / lxc? I wonder what is the frequency of this problem.

评论 #11089580 未加载

评论 #11094244 未加载

outworlder超过 9 年前

Shouldn't this affect anything that uses veths, not just containers? Such as Openstack.

评论 #11089561 未加载

pmontra超过 9 年前

评论 #11088024 未加载

评论 #11088799 未加载

doggydogs94超过 9 年前

I like the "goto drop" statement.

评论 #11096676 未加载

sz4kerto超过 9 年前

Does this affect Docker overlay networks?

bobinator606超过 9 年前

use romana.io instead of veth

Linux kernel bug delivers corrupt TCP/IP data

13 条评论

Linux kernel bug delivers corrupt TCP/IP data

13 条评论