Leap second causing Linux server crashes?

253 pointsby sathyabhatalmost 13 years ago

29 comments

__david__almost 13 years ago

It appears to be fixed in Linux 3.4 [1]. According to the original commit [2] it's been broken since 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 [3], which appeared in 2.6.26.So, kernels between 2.6.26 and 3.3 (inclusive) are vulnerable.[1] <a href="https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bcd550745fc54f789c14e7526e0633222c505faa" rel="nofollow">https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2....</a>[2] <a href="https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b43ae8a619d17c4935c3320d2ef9e92bdeed05d" rel="nofollow">https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2....</a>[3] <a href="https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7dffa3c673fbcf835cd7be80bb4aec8ad3f51168" rel="nofollow">https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2....</a>

评论 #4183535 未加载

评论 #4184010 未加载

评论 #4184400 未加载

评论 #4183844 未加载

dfcalmost 13 years ago

Google uses a "leap smear" and slowly accounts for the leap second before it happens.[1] As long as you are not doing any astronomical calculations or constrained by regulatory requirements I think google has the right idea.[1] <a href="http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html" rel="nofollow">http://googleblog.blogspot.com/2011/09/time-technology-and-l...</a>

评论 #4182970 未加载

ChuckMcMalmost 13 years ago

Not surprising. In spite of all press that Y2K was just a silly waste of money, its events like these that makes me suspect it would have been a much bigger deal if everyone had ignored it and fixed it after things where shown to break.

评论 #4182952 未加载

评论 #4183013 未加载

duiker101almost 13 years ago

2012. and we still have problems keeping track of time. This is both fascinating and scary.P.S. for people wanting to know more this video is simple to understand but really amazing <a href="http://www.youtube.com/watch?v=xX96xng7sAE" rel="nofollow">http://www.youtube.com/watch?v=xX96xng7sAE</a>

评论 #4182769 未加载

评论 #4182874 未加载

评论 #4182833 未加载

评论 #4183212 未加载

评论 #4183941 未加载

评论 #4184969 未加载

kabdibalmost 13 years ago

Fear the Unix 32-bit time-becomes-negative bugs, in 2037.We have 25 years to get ready. I still think we'll be patching at the last minute.(Yeah, lots of systems will be 64-bit by then, but there will still be a lot of embedded crackerbox systems running 32-bit timestamps. It's all the embedded stuff I'm worried about).

评论 #4184012 未加载

评论 #4184928 未加载

kzk_moveralmost 13 years ago

Now facing this issue... By using 'adjtimex' command, you can clear the problematic INS bit.At first, you can confirm the status flag like this.<pre><code> $ ./adjtimex --print | grep status status: 8209 </code></pre> 8209's binary representation is like this. This surely have INS bit "100000000[1]0001" (5th LSB).<pre><code> $ ruby -e 'p 8209.to_s(2)' "10000000010001" </code></pre> 8193 is the value after the clearance of the INS big.<pre><code> $ ruby -e 'p 8193.to_s(2)' "10000000000001" </code></pre> Then, let's set it as a current value. Please ensure your ntpd is not running.<pre><code> $ adjtimex --status 8193</code></pre>

MrUnderhillalmost 13 years ago

Novell kb: <a href="http://www.novell.com/support/kb/doc.php?id=7001865" rel="nofollow">http://www.novell.com/support/kb/doc.php?id=7001865</a><pre><code> SLE9 (kernel 2.6.5-7.325): NOT AFFECTED SLE10-SP1 (kernel 2.6.16.54-0.2.12): NOT AFFECTED SLE10-SP2 (kernel 2.6.16.60-0.42.54.1): NOT AFFECTED SLE10-SP3 (kernel 2.6.16.60-0.83.2): NOT AFFECTED SLE10-SP4 (kernel 2.6.16.60-0.97.1): NOT AFFECTED SLE11-GA (kernel 2.6.27.54-0.2.1): VERY UNLIKELY SLE11-SP1 (kernel 2.6.32.59-0.3.1): VERY UNLIKELY SLE11-SP2 (kernel 3.0.31-0.9.1): VERY UNLIKELY Update (06/26/2012): after thorough code review -> SLE9 and SLE10 not affected at all.</code></pre>

brongondwanaalmost 13 years ago

FYI: I've updated the post with details of the workaround as implemented on our servers.

shaggyalmost 13 years ago

Pardon the ignorance if this is a stupid question. I've been looking at some of my hosts and have noticed a message "Clock: inserting leap second 23:59:60 UTC" in dmesg output but each of the hosts is in the EDT timezone so the I was under the impression that the leap second hadn't been applied yet. So what does that mean? That the systems have applied the leap second successfully or have only received it from their NTP servers?

评论 #4183574 未加载

评论 #4183684 未加载

piggityalmost 13 years ago

We just had 100s of EC2 instances generate high (alleged) load. Instances had load averages of 90+ but were responsive.Running on a 3.2 kernelRebooted them all and they're fine.

评论 #4183302 未加载

kristopheralmost 13 years ago

FYI: Our Debian servers did not kernel panic but system CPU load went through the roof; A quick restart brought levels back to normal.

评论 #4185449 未加载

politicianalmost 13 years ago

After reading these tales of woe, all I can say is that I hope the criminal element doesn't start assaulting NTP servers.

mootothemaxalmost 13 years ago

I was logged on to a couple of CentOS 6 servers when I saw this happen, and on each one the Java processes went absolutely haywire. Everything else seemed to work fine.I attempted to fix with adjtimex and the script in the linked question, but to no avail, in the end having to restart them all instead. After that, all was good again.

评论 #4183364 未加载

评论 #4184076 未加载

glawatscheckalmost 13 years ago

POSTMORTEM fix for CPU eating softirqd threads without rebooting:stop ntpd, run ntpdate or sntp, start ntpd/etc/init.d/ntp stop; sntp -s <ntpserver>; /etc/init.d/ntp startUnfortunately sntp / ntpdate wrapper is not shipped with squeeze for example. I've used the binary from SuSE 11.4 just fine on squeeze.

评论 #4184275 未加载

raverbashingalmost 13 years ago

Ouch!My Debian GNU/Linux 6.0 is still standingOh well, reading the issue, the machine date is Sat Jun 30 16:11:31 EDT 2012Stopped ntpd just in case

评论 #4182941 未加载

评论 #4183443 未加载

yaixalmost 13 years ago

Two days ago while booting, the BIOS time on my eeepc was suddenly reset, with an error message on boot to adjust the time manually. Was just thinking that it may be related?

sayeedalmost 13 years ago

Our Linux instances running on Amazon EC2 had no issues since we are not running ntpd on these servers and adjtimex returns status as 64 (clock unsynchronized).I think the Xen host takes care of the synchronization and we need not do it in the guest OS. (see <a href="http://serverfault.com/questions/100978/do-i-need-to-run-ntpd-in-my-ec2-instance" rel="nofollow">http://serverfault.com/questions/100978/do-i-need-to-run-ntp...</a>).Is this fine or should we run ntpd for better accuracy?

评论 #4183778 未加载

arohneralmost 13 years ago

Stupid question: Why was this not caught? Seems pretty easy to test. Just set the clock to today (or any day with a leap second), and watch what happens.

评论 #4183888 未加载

cullenkingalmost 13 years ago

On debian, I was able to fix the issue (fix the load issue specifically) with this command/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

chmod775almost 13 years ago

If really all of the Linux where affected more than half of the Internet would be still down by now. Could be only a specific combination of kernel/userspace bugs that only exists in some systems.What a bit sucks is that my VPN was affected to (openvpn) causing my computer to do a poweroff. I replaced the poweroff withip route add to 192.168.1.0/24 dev lohope that saves me when the next leap second occurs.

Monotokoalmost 13 years ago

Pirate Bay has also been crashed by this: "TPB crashed just after midnight June 30th GMT (5.5 hrs ago) The crash appears to have been caused by the leap second that was issued at midnight."<a href="https://forum.suprbay.org/showthread.php?tid=125071" rel="nofollow">https://forum.suprbay.org/showthread.php?tid=125071</a>

bifrostalmost 13 years ago

No burps from my BSD boxes either, although they're all in UTC so the leap second hasn't happened for them yet.

评论 #4182789 未加载

mkr-hnalmost 13 years ago

Is this implementation-specific, or could the Windows equivalent to ntp cause the same problem?

评论 #4182837 未加载

ernestiparkalmost 13 years ago

My AWS EC2 instances got spun up to 100% cpu and have been like that for a day. Basically saw a step function from 0 to 100 in the CPU graph. Just had to reboot them.

x3calmost 13 years ago

Hey, I'm running Ubuntu 12.04 . Could someone guide me through what I can do to detect/prevent this from crippling my server? Thanks.

评论 #4183225 未加载

icefoxalmost 13 years ago

Oddly netflix went down for me at 12:01 last night...I assumed some cronjob or something similar was to blame.

评论 #4182904 未加载

drivebyacct2almost 13 years ago

My Ubuntu servers seem unaffected thus far.

评论 #4184237 未加载

评论 #4182964 未加载

评论 #4182817 未加载

sohn5almost 13 years ago

That wouldn't happen if servers were Macs

评论 #4182744 未加载

评论 #4185633 未加载

评论 #4182771 未加载

评论 #4182793 未加载

aidanbrandtalmost 13 years ago

Read that as "high rates of cash."

评论 #4183759 未加载