Linux – Ubuntu 10.10 Maverick Server makes system locks up at random intervals (i7 930; 12GB RAM)

ext4linuxUbuntu

To introduce:
I have a machine with the following specs:

  • Intel Core i7 930
  • 12GB Ram Corsair
  • 2xSamsung HDD 320gb (No Raid, just partitions)
  • Asus P6TD Deluxe

The machine has been at Datacenter just a few days. The average load is 0.50 and I have the following partitions:

 /        ext4    noatime,barrier=0,errors=remount-ro 0 1
 /datos   ext4    noatime                             0 2

Now the problem:
At random intervals, the machine locks up, SSH lags heavily, and viewing htop, it says that all cores are being hammered by system processes.
http://korrupzion.com/htop.png

Iostat output during freeze (got this when i finally managed to execute a command during freeze):

iostat
Linux 2.6.35-22-server (charizard)         25/10/10        _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,33    0,00   10,38    1,21    0,00   84,07

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2,68       269,23         0,01     906918         24
sdb              52,30       897,99      1896,08    3024878    6386976

Vmstat output:

vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1      0 5680460 128056 1907340    0    0   115   131 1091 2621  4  9 86  1
 4  0      0 5676360 128064 1909036    0    0  1872    52 4606 18143 10 33 57  0

I suspect that ext4 is related to those freezes, but i'm not sure anyway. Ubuntu was installed a week ago, before sending the machine to datacenter, before that, I had windows 7 to test performance and didn't have any sort of freezes

Please if you know another command to track the source of this freezes let me know, I've thinking about formatting again to debian lenny, which was the S.O I used to install to other machine w/o problems

Thanks.

EDIT 1: I remounted "/datos" with barrier=0, now i'm monitoring if problems raise again.

EDIT 2: Remounting /datos with barrier=0 didn't work u.u. Still trying to find solution

Best Answer

I just reported a bug to Launchpad about the very same issue, five days ago. It is also an Intel Core i7-930, in an Intel DX58SO mainboard:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/665796

Most certainly yours is the same problem, your description matches too much with my problem. Check if your system timer interrupt freezes during these lockups. Keep this running in a terminal:

watch -d grep timer /proc/interrupts

You will need an external source of interrupts during the lockup in order to make the process scheduler operate and see the command executing. Pinging your computer from another machine (with a small interval) will help.

If the timer interrupt stops incrementing during the lockup, then it is the same problem. Please, add your system information to the above bug report so that we get attention from the Ubuntu developers:

ubuntu-bug -u 665796 -p linux
Related Topic