Linux – Fit-PC bricked due to leap second, how to prevent the second one from failing

leapsecondlinux

I've got three Fit-PCs in use. They are being used as light-weight Linux servers. Unfortunately, on Jun 30, the first of them failed to start due to the leap-second bug. I tried rebooting it a few times, but the screen remained blank after the third bootup-attempt. This appeared to be hardware-related and we took it to a repair-man. He told us something had overheated and that the motherboard was broken. He was able to recover the data, but the fit-pc was written off.

The second Fit-PC was unable to reboot a few days later (first time we actually tried to reboot). With apparently sheer luck, it rebooted on the third attempt, and it is now working fine.

The third Fit-PC had not given any problems. When I found out the other ones failed due to the Leap-Second, I actually thought we were lucky with this third one. Fact is, the recent slowness of the server was most likely due to this same bug, and now that I rebooted this machine (first time after Jun 30), it's giving me the exact same symptoms as the other ones. These symptoms are:

  • Initial reboot attempt fails; OS does not load.
  • I connect a screen to see what is going on. Remains black.
  • I reboot again. I now see the regular loading screen ("Intel Atom…"), but this freezes
  • I try to reboot again.
  • Screen now simply does not activate at all. It does now show any sign of life. The monitor simply acts as if nothing is sending any signal, so I have no way to interact with the CPU whatsoever.

I've trying to reboot about 4 times now, but am very much fearing the same problem as before. Where I live the Fit-PCs are uncommon and I am not sure if there are qualified techs who actually know how to repair this (and I am not even sure if the diagnosis of the other tech was correct). So I am asking: do you also think my motherboard was overheated and was yet another Fit-PC bricked, or is there something else I can do?

EDIT: Using Ubuntu 12.04 on all of the Fit-PCs.

EDIT:

I also considered a power-failure. But there are a few inconsistencies:

  • the servers are on three different sites,
  • no power surge was reported and no other hardware was affected – weather was sunny and calm,
  • the only similarity between the three machines was that they started acting odd every since Jun 30 (the third one was having high loads but I failed to recognize this until the first reboot since Jun 30, which I did today).

I could also not find other Fit-PCs affected by the leap-second, but am simply not sure what else could cause this…

Best Answer

Actually the leap second lead to more power-consumption of about 1 MW at Hetzner.

Because the CPUs turned to 100% busy. And that can cause hardware-damage, too (overheating).

You should check whether the leap-second flag is still set on your machines.

date -s "$(LC_ALL=C date)" should fix it...

If top does not show busy CPUs it might be that there already is a hardware damage.