System occasionally hangs boot process with SLES 11

bootsles11

I have several (new) systems on which I had to install SLES11 on. However, after a few (though not every) reboots, the system hangs during the boot sequence. It will only continue after I physically press a key on the keyboard.

From what I've found in the dmesg log from a failed boot is the following:

[   22.170276] sd 0:0:0:0: [sda] Mode Sense: b7 00 00 08
[   22.171155] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   22.182760]  sda: sda1 sda2 sda3
[   22.383424] sd 0:0:0:0: [sda] Attached SCSI disk
[   22.545372] PM: Marking nosave pages: 000000000009a000 - 0000000000100000
[   22.545377] PM: Marking nosave pages: 00000000bf780000 - 0000000100000000
[   22.546217] PM: Basic memory bitmaps created
[   22.590380] PM: Basic memory bitmaps freed
[   22.596284] PM: Starting manual resume from disk
[   22.602319] PM: Resume from partition 8:1
[   22.602321] PM: Checking hibernation image.
[   22.602479] PM: Error -22 checking image file
[   22.602481] PM: Resume from disk failed.
[   22.718727] kjournald starting.  Commit interval 15 seconds
[   22.718960] EXT3-fs (sda3): using internal journal
[   22.718964] EXT3-fs (sda3): mounted filesystem with ordered data mode
[ 1555.644404] udevd version 128 started
[ 1555.697664] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[ 1555.707961] ACPI: Power Button [PWRB]

I've looked around the internet for the PM: Resume from disk failed. message, but this seems to only be important when restoring the system after a hybernate, i.e. restore from the hdd.

But this is not my situation. I only get this after a reboot, as I said before.
The timestamp [ 1555.xxxxxx] is only the result of me pressing a key on the keyboard.

The SLES version I use is 11 SP1. Please do note that I have also installed the RealTime Extension on all of the systems. And as such, updating to SP2 should not matter as the RT Extension is the same for SP1 and SP2.

Any suggestions on how to proceed? As I am getting stuck on this issue.


Edit:

I've come to notice that this issue is caused by the RT-kernel from the RealTime Extension. I only seem to get the hang when I boot up with the RT-kernel.


Edit 2:

I decided to take a closer look at what exactly happens during the boot. The result is a screenshot from a serial connection with PuTTY to one of the relevant systems:

Screenshot of boot process

The red square is where I have to press a key to make the booting sequence continue.
It seems that the boot sequence hangs on fsck, or that it runs in some sort of interactive mode?


Edit 3:

It seems that it is not possible to upgrade to SP2, as the RT-kernel is incompatible with it (fails to install, and when forced to install, it won't be bootable).

Best Answer

Although I am still not sure what the real problem is with this, I do have found a work-around to "solve" the issue.

By executing

# tune2fs -c 1 /dev/sdaX

I was able to force the fsck problem that (seems to) suspend to continue with it's business. This however, forces the program to run every single time the system is rebooted (taking up extra time during boot). Fortunately, the hard disk used on the system is not too large, so the boot time is not terribly increased by the measure taken.

Related Topic