TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Turtles on the Wire: Understanding How the OS Uses the Modern NIC (2016)

300 点作者 nz大约 8 年前

7 条评论

drewg123大约 8 年前
The biggest innovation that I&#x27;ve seen in quite some time in terms of making creative use of hardware offloads is Hans Petter Selasky&#x27;s RSS assisted LRO in FreeBSD.<p>On our workloads (~100K connections, 16 core &#x2F; 32 HTT FreeBSD based 100GbE CDN server) LRO was rather ineffective because there were roughly 3K connections &#x2F; rx queue. Even with large interrupt coalescing parameters and large ring sizes, the odds of encountering 2 packets from the same connection within a few packets of each other, even in a group of 1000 or more, are rather small.<p>The first idea we had was to use a hash table to aggregate flows. This helped, but had the draw back of a much higher cache footprint.<p>Hps had the idea that we could sort packets by RSS hash ID <i>before</i> passing them to LRO. This would put packets from the same connection adjacent to each other, thereby allowing the LRO without a hash table to work. Our LRO aggregation rate went from ~1.1:1 to well over 2:1, and we reduce CPU use by roughly 10%.<p>This code is in FreeBSD-current right now (see tcp_lro_queue_mbuf())
评论 #14020965 未加载
policedemil大约 8 年前
Great article! A lot of this is way beyond me, but I&#x27;m generally interested in the process of how a NIC filters based on MAC addresses.<p>I&#x27;m in the humanities and certain scholars working with culture and technology love to make a huge deal about data leakage and how intertwined we all are precisely because you can put a NIC in promiscuous mode and cap packets that weren&#x27;t meant for you. The whole point is that because your NIC is constantly receiving data meant for others (i.e. because it&#x27;s filtering the MAC addresses), something like privacy on networks is always problematic. I&#x27;ve always found the whole point somewhat overstated.<p>So, could anyone explain real quick the process of how a NIC decides whether a packet&#x2F;frame is actually bound for it or link some good resources? For example, does the NIC automatically store the frame&#x2F;packet in a buffer, then read the header, and then decide to discard? Or can it read the header before storing the rest of the frame? How much has been read at the point the NIC decides to drop it or move it up the stack? Reading all of every packet seems improbable to me because if it were the case, laptop 1 (awake but not downloading anything) would experience significant battery drain due to constantly filtering network traffic that was meant for laptop 2. I&#x27;m not sure that really maps to my experience. Also, I assume there are also differences for LAN vs WiFi?<p>Any help on the matter would be greatly appreciated! I&#x27;ve tried google diving on this question many times before and it&#x27;s really hard to find much on it.
评论 #14019335 未加载
评论 #14018887 未加载
评论 #14020666 未加载
评论 #14018876 未加载
评论 #14019075 未加载
评论 #14018897 未加载
en4bz大约 8 年前
This is why I&#x27;m really hoping RDMA [1] will catch on soon. It would be great if there was a cloud provider that would enable this feature on some of their offerings. Amazon has done something similar by allowing kernel bypass via DPDK [2] with their ENA offering but Kernel bypass is inferior to RDMA in so many ways IMO.<p>At this point we have 200Gbit&#x2F;s NICs being provided by Mellanox [3]. CPUs aren&#x27;t getting any faster and the scale out approach is extremely difficult to get right without going across NUMA domains [4]. Based on the progression of CPUs lately there just isn&#x27;t going to be enough time to process all these packets AND have time left over to actually run your application. There&#x27;s a lot of work focusing on data locality at the moment but at this point it&#x27;s still not fool proof and the work that has been done is woefully under documented.<p>As the article mentioned we&#x27;ve already added a bunch of hardware offloads. RDMA is just a continuation of these offloads but unfortunately it requires some minor changes on the application side to take advantage of which is why it&#x27;s probably been slow to be adopted.<p>RDMA has so many great applications for data-transfer for backend services. Whether it&#x27;s queries between a web-server and a DB, replication&#x2F;clustering of DBs, or micro-service fabric with micro seconds latency. Overall there&#x27;s a lot of low hanging fruit that could be optimized with RDMA.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Remote_direct_memory_access" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Remote_direct_memory_access</a><p>[2] <a href="http:&#x2F;&#x2F;dpdk.org&#x2F;doc&#x2F;nics" rel="nofollow">http:&#x2F;&#x2F;dpdk.org&#x2F;doc&#x2F;nics</a><p>[3] <a href="http:&#x2F;&#x2F;www.mellanox.com&#x2F;page&#x2F;products_dyn?product_family=266&amp;mtag=connectx_6_en_card" rel="nofollow">http:&#x2F;&#x2F;www.mellanox.com&#x2F;page&#x2F;products_dyn?product_family=266...</a><p>[4] <a href="http:&#x2F;&#x2F;rhelblog.redhat.com&#x2F;2015&#x2F;09&#x2F;29&#x2F;pushing-the-limits-of-kernel-networking&#x2F;" rel="nofollow">http:&#x2F;&#x2F;rhelblog.redhat.com&#x2F;2015&#x2F;09&#x2F;29&#x2F;pushing-the-limits-of-...</a>
评论 #14020005 未加载
评论 #14019879 未加载
评论 #14020028 未加载
评论 #14022641 未加载
评论 #14020291 未加载
评论 #14020688 未加载
tedunangst大约 8 年前
Buggy firmware with edge cases is putting it mildly. I suppose a checksum of 0000 or ffff is technically an edge case, but not all that uncommon, and a pretty popular thing to get wrong.
评论 #14018508 未加载
bluetech大约 8 年前
Interesting article.<p>Another related article I found interesting: <a href="https:&#x2F;&#x2F;www.coverfire.com&#x2F;articles&#x2F;queueing-in-the-linux-network-stack&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.coverfire.com&#x2F;articles&#x2F;queueing-in-the-linux-net...</a> Discusses some of the queues in the Linux network stack.
ams6110大约 8 年前
As an aside, is anyone using the Joyent cloud stuff in production? Any good comparisons to Openstack? Looking for something easier to manage.
pthreads大约 8 年前
This is a very useful write-up. I thoroughly enjoyed it i.e. found it informative. Thank you.