TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

When your IP traffic in AWS disappears into a black hole

185 点作者 schimmy_changa超过 10 年前

13 条评论

jcollins超过 10 年前
We saw this earlier this year after upgrading to a new Linux kernel.<p>The solution for us was to set this in sysctl.conf:<p>net.ipv4.neigh.default.gc_thresh1=0<p><a href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1331150/" rel="nofollow">https:&#x2F;&#x2F;bugs.launchpad.net&#x2F;ubuntu&#x2F;+source&#x2F;linux&#x2F;+bug&#x2F;1331150...</a> <a href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1331150/comments/12" rel="nofollow">https:&#x2F;&#x2F;bugs.launchpad.net&#x2F;ubuntu&#x2F;+source&#x2F;linux&#x2F;+bug&#x2F;1331150...</a>
danesparza超过 10 年前
Am I weird because I actually muttered &#x27;ARP caching issue&#x27; halfway through your article? :-)<p>Love the technical write-up -- thanks!
评论 #8730928 未加载
评论 #8731520 未加载
评论 #8734616 未加载
评论 #8731387 未加载
评论 #8731933 未加载
评论 #8731478 未加载
评论 #8732143 未加载
评论 #8731179 未加载
ChuckMcM超过 10 年前
Interesting trace down to stale ARP entries. It gets worse when the switches are running mac address filtering and <i>they</i> get out of date. We had that issue with some Blade G8052 top of rack switches with their upstream 10G ports. They sometimes &quot;forget&quot; which upstream port has the MAC address that they are switching too, and those packets just spew out messily into the data center leaving a mess. The &quot;fix&quot; it to force the switch to ping up through a specific upstream port periodically to the center switch&#x27;s management IP address. Sigh.
评论 #8731266 未加载
spectre256超过 10 年前
This reminds me of a time at a previous company years ago, where we experienced an issue that felt similar, although the root cause was quite different.<p>Basically, we had multiple teams all launching&#x2F;terminating web servers. Unfortunately, they were all in the same EC2 deployment, and more often than not our load balancers from one team would send traffic to the web servers of another team. Furthermore, our setups were similar enough that this would sometimes cause bad results for users. We fixed it by making sure that our web servers on every team spoke on different ports. Not elegant, but effective (until two teams accidentally picked the same ports).<p>These days we have good enough infrastructure tools that this problem should never happen. But in 2009, at a company that was overwhelmed with growth, those sort of things happen.
评论 #8731315 未加载
falcolas超过 10 年前
Try an arping from the new workers on first startup? Ran into this quite a bit when using VIPs for DB failover, and an arping fixed the caching issue in most cases.
schimmy_changa超过 10 年前
I think the biggest thing I was surprised by with this investigation was the lack of documentation about data-layer tools. At one point I was looking through the source of the &#x27;ip&#x27; command to try to find out exactly which conditions caused a &#x27;STALE&#x27; entry in the ARP table...
maerF0x0超过 10 年前
I wonder if this is a problem for any cloud provider, I also wonder if ipv6 could help mitigate? Then the IP collusions would be rarer.
评论 #8732356 未加载
评论 #8730785 未加载
评论 #8730803 未加载
评论 #8731198 未加载
wahnfrieden超过 10 年前
FYI, Clever: I click &quot;Engineering Blog&quot; at the top, and all links to blog posts on that page 404.
评论 #8731082 未加载
girvo超过 10 年前
We had a fascinating bug on EC2 -- we could connect <i>to</i> the instance, but no network traffic made it out. It wasn&#x27;t security group problems, it was literally a really weird bug in EC2&#x27;s network that we somehow triggered, the engineer over at Amazon that looked at it was really excited when he came across our case as it was so weird, heh. They fixed it, I can&#x27;t remember exactly what was done on their end, but it was one of the weirder problems I&#x27;ve attempted to debug. Nothing I tried worked!
perlgeek超过 10 年前
Wouldn&#x27;t it be a better solution to not reuse IP addresses quickly? If I understood it correctly, they are in a private network anyway, so they could afford it.
评论 #8731549 未加载
zenocon超过 10 年前
I just experienced this early this week. Very frustrating. I also posted to AWS forums and got zero assistance; am currently not paying for AWS support plan. This article came at an opportune moment -- it makes sense and removes the shroud of mystery around why it &quot;works sometimes&quot; which leaves me with an uneasy feeling for a production setup.
评论 #8732386 未加载
kiyoto超过 10 年前
Looking at the port number, it looks like Clever is a MongoDB user =)
评论 #8732080 未加载
legohead超过 10 年前
I was going to respond with my little story, but I see your article already linked it! ;)