LInux: How to diagnose / isolate what’s causing “random” hangs and spontaneous reboots

linuxUbuntuubuntu-9.10

So, rather than guessing just what the cause is (though my money's on the nvidia drivers), where do I start looking to pin down some facts?

I've been through /var/log on several occasions but there's a LOT of stuff in there and I can't (yet) spot the important bits.

Background: The Short Version

I moved from WinXP to Ubuntu Karmic just after it became available.

Since then I have had a series of seemingly random crashes that manifest as either:

  • a spontaneous reboot
  • a complete lockup with my USB keyboard and mouse becoming unresponsive (right down to he LEDs all turning off). Also I will typically be unable to ssh to the box when this happens.

I've done plenty of searching and Nvidia seems to be the prime suspect but I have no idea where to start looking to work out just what the real cause is.

Suggestions?

Background: The Long Version

At times, I can go an entire week without a crash then have 5 in 2 days.

Motivated by the desire to eliminate possible suspects, I've made a few changes over time to no avail:

  • Originally I used KVM for virtualization, I now use VirtualBox OSE
  • I had NFS running in the kernel but now use Samba
  • I was using Compiz but have since turned that off
  • I've rolled from 64-bit Karmic to 32-bit (for other reasons as well)
  • I've tried Ubuntu, Kubuntu and Xubuntu. Same trouble each time.
  • I rolled the Nvidia driver from version 185 back to version 96 (NVIDIA Linux x86 Kernel Module 96.43.13 Thu Jun 25 18:42:21 PDT 2009). This seems to have reduced the frequency of error.

In terms of what's running at the time, this can vary. The following are common but were not necessarily running for every crash:

  • Firefox 3.5
  • VirtualBox OSE with 1 or 2 Windows XP VMs
  • Skype
  • Rhythmbox or Exaile

My hardware is 2 – 3 years old:

  • Core 2 Duo 6300
  • 4GB RAM
  • some breed of Intel motherboard of that vintage
  • an Asus dual-head video card with Nvdia GeForce 7300 GS chipset
  • 2 x SATA HDDs
  • dual monitors (hence I rely on the proprietary nvidia drivers)

I've been keeping current with my system updates.

Hopefully the data above might prompt someone to suggest a specific type of log or config that would be worth investigating.

Updates
RAM seems fine
Per suggestion below will re-post on superuser

Best Answer

Linux and other Unix like systems are more sensitive to flaky RAM than windows. I would run memtest86 and check the RAM