Linux – Sudden kernel panics on Linux server

kernel-paniclinux

A few days ago a server I manage had a panic, after 400+ days of uptime. I rebooted it and it worked for two days or so, then it hit an "oops: cpu#n stuck for 61s" for various values of n.
Rebooted again, and today the original kernel panic appeared again. The trace is (retyping manually, so skipping addresses):

Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G        D    2.6.32-41-server #89-Ubuntu
Call Trace:
 <IRQ> panic
 oops_end
 die
 do_general_protection
 ? consume_skb
 general_protection
 ? put_page
 skb_release_data
 __kfree_skb
 consume_skb
 dev_kfree_skb_any
 sky2_tx_complete
 sky2_status_intr
 ? __queue_work
 sky2_poll
 net_rx_action
 __do_softirq
 ? handle_IRQ_event
 call_softirq
 do_softirq
 irq_exit
 do_IRQ
 ret_from_intr
 <EOI> ? mwait_idle
 ? atomic_notifier_call_chain
 ? cpu_idle
 ? start_secondary

RIP put_page

The OS is Ubuntu 10.04.4 x64.
Since it has always worked and nothing was changed before the panics, I am thinking about some hardware fault. Before the last reboot I did a full memtest and it passed, as well as a full fsck just to be sure. Since the panic is related to sky2 (marvell network controller) it may be a nic problem? Is there something I have overlooked?
Consider that between errors everything is working perfectly (no errors in logs, no dropped packets, no slowdowns).

Thanks for any pointer

Best Answer

A hardware problem is likely. Clogged and/or failed fans, too high temperature, bad RAM, bad CPU, a misbehaving power supply, motherboard nearing the end of life ....