Lvm – reiserfsck on lvm – Valuable Tech Notes

It seems like my filesystem got corrupted somehow during the last reboot of my server. I can't fsck some logical volumes anymore. The setup:

root@rescue ~ # cat /mnt/rescue/etc/fstab 
proc /proc proc defaults 0 0
/dev/md0 /boot ext3 defaults 0 2
/dev/md1 / ext3 defaults,errors=remount-ro 0 1

/dev/systemlvm/home /home reiserfs defaults 0 0
/dev/systemlvm/usr /usr reiserfs defaults   0 0
/dev/systemlvm/var /var reiserfs defaults   0 0
/dev/systemlvm/tmp /tmp reiserfs noexec,nosuid 0 2

/dev/sda5 none swap defaults,pri=1 0 0
/dev/sdb5 none swap defaults,pri=1 0 0

[UPDATE]
First question: what "part" should I check for bad blocks? The logical volume, the underlying /dev/md or the /dev/sdx below that? Is doing what I am doing the right way to go?
[/UPDATE]
The errormessage when checking /dev/systemlvm/usr:

root@rescue ~ # reiserfsck /dev/systemlvm/usr 
reiserfsck 3.6.19 (2003 www.namesys.com)
[...]
Will read-only check consistency of the filesystem on /dev/systemlvm/usr
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Wed Feb  3 07:10:55 2010
###########
Replaying journal..
Reiserfs journal '/dev/systemlvm/usr' in blocks [18..8211]: 0 transactions replayed
Checking internal tree..

Bad root block 0. (--rebuild-tree did not complete)

Aborted

Well so far, let's try --rebuild-tree:

root@rescue ~ # reiserfsck --rebuild-tree /dev/systemlvm/usr 
reiserfsck 3.6.19 (2003 www.namesys.com)

[...]

Will rebuild the filesystem (/dev/systemlvm/usr) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal..
Reiserfs journal '/dev/systemlvm/usr' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Wed Feb  3 07:12:27 2010
###########
Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 269716 blocks marked used
Skipping 8250 blocks (super block, journal, bitmaps) 261466 blocks will be read
0%....20%....40%....60%....80%....100%                       left 0, 11368 /sec
52919 directory entries were hashed with "r5" hash.
        "r5" hash is selected
Flushing..finished
        Read blocks (but not data blocks) 261466
                Leaves among those 13086
                Objectids found 53697

Pass 1 (will try to insert 13086 leaves):
####### Pass 1 #######
Looking for allocable blocks .. finished
0%                                                           left 12675, 0 /sec
The problem has occurred looks like a hardware problem (perhaps
memory). Send us the bug report only if the second run dies at
the same place with the same block number.

mark_block_used: (39508) used already
Aborted

Bad. But let's do it again as mentioned:

[...]
Flushing..finished
        Read blocks (but not data blocks) 261466
                Leaves among those 13085
                Objectids found 54305

Pass 1 (will try to insert 13085 leaves):
####### Pass 1 #######
Looking for allocable blocks .. finished
0%...                                                      left 12127, 958 /sec
The problem has occurred looks like a hardware problem (perhaps
memory). Send us the bug report only if the second run dies at
the same place with the same block number.

build_the_tree: Nothing but leaves are expected. Block 196736 - internal

Aborted

Same happens every time, only the actual error message changes. Sometimes I get mark_block_used: (somenumber) used already, other times the block number changes.
Seems like something is REALLY broken. Are there any chances I can somehow get the partitions to work again?
It's a server to which I don't have physical access directly (hosted server).

Thanks in advance!

Best Answer

Well, after a few more hours of reiserfscking it seems like repeating this three-step process

reiserfsck --check ...
reiserfsck --rebuild-sb ...
reiserfsck --rebuild-tree ...

solves the problem eventually. I still don't know the cause for the problem as there seem to be no badblocks on any drive, neither do I know how much data is lost, but after all I am pretty sure that this should not happen. One partition is still "replaying its journal" but I will tell about the success (or failure) as soon as I can reboot the computer.

Best Answer

Related Solutions

LVM Performance overhead

Linux – better LVM on RAID or RAID on LVM

Related Topic