TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The night of 1000 alerts (but only on the Linux boxes)

5 pointsby r4umover 2 years ago

1 comment

LinuxBenderover 2 years ago
For what it&#x27;s worth, one can reduce the impact of such issues by populating &#x2F;etc&#x2F;resolv.conf with &quot;options timeout {t} attempts {a}&quot; {t} being the time you find acceptable to wait for a dns response and {a} being the number of attempts. In a datacenter I also ensure that all occurrences of &quot;domain&quot; and &quot;search&quot; are removed from &#x2F;etc&#x2F;resolv.conf as they amplify the timeouts. Always use FQDN&#x27;s in applications without exception. One can also populate &#x2F;etc&#x2F;gai.conf with parameters to prefer ipv4 over ipv6 or even disable ipv6 if your outbound servers talk through an edge device that will translate&#x2F;proxy to ipv6 when required as having resolvers try both A and AAAA will amplify problems.<p>To see what I am talking about, create a ram disk on your edge recursive servers and configure your DNS daemon to write query logs to a file in that ramdisk. This will slow down your DNS servers a little but unless one sees the bigger picture of what is happening, one can not make educated decisions to retrofit the platform architecture. This is only require temporarily to analyze the behavior of the application and OS DNS requests.<p>Another useful improvement is to literally run Unbound on every single node and configure it to use multiple edge recursive servers and set the parameters to keep probing all of them but prefer the fastest ones. This reduces load on the edge recursive servers, improves application response times and minimizes DNS outage issues. Tune cache-min-ttl to what is optimal for your datacenter. I also find it useful to set delay-close higher on all the edge recursive servers to further reduce latency.<p>There are thousands of other improvements one could make but then this turns into a book.