For about the past 72 hours, our EC2 instances have been encountering intermittent DNS resolution failures when talking to the default AWS DNS server, 172.16.0.23.<p>These failures pretty much always occur at around one minute after the hour (i.e., between 12:01 and 12:02, between 1:01 and 1:02, etc.), and the form the failures take is that the AWS nameserver returns SERVFAIL.<p>I have attempted to isolate the problem to AWS's name servers, as opposed to the DNS servers that AWS is speaking to, by running code outside of AWS that looks up exactly the same DNS records at the same time. I have a script which runs out of cron at 1 minute one minute after the hour and spends about 60 seconds repeatedly looking up host names that doesn't exist and reporting if the lookups return SERVFAIL instead of NXDOMAIN. The script returns occasional errors when run in AWS, but returns absolutely no errors when run outside of AWS.<p>The domain I'm looking up records in is hosted by Dyn through their DynECT service, but I'm not sure that's relevant since I've confirmed that the errors only occur when the AWS nameserver is in the loop.<p>Amazon's DNS servers are notoriously unreliable, but we've never seen this particular failure mode before; the usual failure more is that DNS simply doesn't work at all on a particular instance and we have to terminate and replace it. Certainly, we've never seen a failure mode where the all the errors occur on an hourly cycle like this.<p>What I'm looking for from HN is:<p>1) Are you seeing similar behavior in your AWS deployments?<p>2) Would you be able to run a script similar to the one I'm running to find out if you can reproduce the issue?<p>If I can collect evidence that this is happening to lots of people rather than just us, I have a better chance of convincing Amazon to pay attention to it.<p>Thanks!