Ubuntu – Server freezes completely in unknown condition

Ubuntu

I've recently assembled a server for virtualization. The problem is undetected in installation. When I deployed applications (based on openstack under ubuntu 12.04), the server freezes in random time (usually 10 to 40 hours). It could survive extreme stress test which means there is no temperature problem. It is interesting that when the kernel has used up almost all of the memory for buffer (I tried by using dd), the system is nearly frozen and cannot receive any incoming network connection, but connected still remains. According to the manual applications should always be able to allocate memory from buffer if there is no other free memory.

Also, I tried to dig something from syslog but there is to much from kernel

UPDATE

After some waiting I got some useful information. The server did not crashed after kernel upgrade but leave something.

Jan 24 19:38:25 shisoft-vmhost kernel: [ 5083.584670] sbridge: HANDLING MCE MEMORY ERROR
Jan 24 19:38:25 shisoft-vmhost kernel: [ 5083.751554] EDAC MC0: 2 CE memory read error on CPU_SrcID#0_Channel#1_DIMM#0 (channel:1 slot:0 page:0xc8b77d offset:0x40 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0091 socket:0 channel_mask:1 rank:1)

Looks like memory issue, any ideas?

Best Answer

The RAM in the server is bad: channel:1 slot:0, which is probably the first stick in the second channel.

Related Topic