MD RAID 1 with external bitmap doesn’t fully resync

mdmdadmraid1

I have an interesting configuration: dual boot system with a RAID 1 that needs to be visible in both Windows and Linux. The Windows install is Win 7 Enterprise, and the Linux install is Kubuntu 10.04. To get the RAID to work, I set it up using Windows's "Dynamic Disks" RAID 1, and brought it up in Linux using MD with no persistent superblock, and a write-intent bitmap on another partition. (Without this bitmap, MD had no way of knowing that the array was in sync, and would do a complete resync every time the array started.) The array is assembled like so:

mdadm --build /dev/md1 -l 1 -n 2 -b /var/local/md1.bitmap /dev/sdb2 /dev/sdc2

I expected that the first time I ran this command, it would resync the array, write out a bitmap with no dirty chunks, and all would be good. This wasn't the case: after completing the resync, the bitmap was mostly clean, but about 5% dirty blocks remained, as revealed by

mdadm -X /var/local/md1.bitmap

I didn't mount the filesystem on /dev/md1 or touch it in any other way.

I then found that stopping and restarting the array:

mdadm --stop /dev/md1
mdadm --build /dev/md1 -l 1 -n 2 -b /var/local/md1.bitmap /dev/sdb2 /dev/sdc2

did indeed read in the bitmap, with an ensuing resync that went quickly because most of the blocks were marked clean. The confusing part is that this resync further reduced the number of dirty blocks, but still did not remove all of them. By repeatedly stopping and restarting I could slowly bring the dirty block count down to around 0.6%, where it seemed to level out.

Any ideas what could be causing this? It smells to me of a race condition somewhere that leads to blocks either being skipped over during synchronization or not properly cleared from the bitmap, but I really have no evidence to prove this. It doesn't look like hardware issues since both drives are new and have zero read errors and reallocated sectors reported by smartctl -a.

Best Answer

I hope I don't offend you with this quick guess, but have you monitored the /proc/mdstat file to ensure --build operation has finished before you've checked mdadm -X?

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 nbd0[3] sda10[0]
      53246315 blocks super 1.2 [3/1] [U__]
      [>....................]  recovery =  1.0% (537088/53246315) finish=203.0min speed=4326K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

Second thing to check is if /var/local is ext2/ext3, quoting man:

Note: external bitmaps are only known to work on ext2 and ext3. Storing bitmap files on other filesystems may result in serious problems.

Related Topic