Linux – Why multipath devnode change after reboot

linuxmultipathstorage-area-network

As I command multipath -ll, the output show like this.

ocr3 (149455400000000000000000001000000ca0200000d000000) dm-9 IET,VIRTUAL-DISK
[size=980M][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:11 sdo 8:224 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:10 sdn 8:208 [active][ready]

However, the same ocr3's devnode such as sdo and sdn change after reboot.
I think it's a problem for consistency.
Why the devnode change after reboot?
How to make devnode permanent across reboot?

Best Answer

To allow for hot-plugging and dynamic reconfiguration, you should not assume that the /dev/sd* device nodes will stay the same from one boot to the next.

On a workstation with just a single AHCI SATA controller, the order will usually be static, as it is mostly determined by the order the storage controller drivers are loaded: normally the driver for the root disk is loaded in initramfs phase of the boot, before usb-storage. The ordering of the disks managed by the AHCI controller is then fixed as the controller ports are probed in port number order by the driver.

But on a system connected to SAN storage, things are not so simple. There can be multiple disk controllers (one for internal system disks, then one for each SAN HBA), and at boot time, the HBAs are generally probed in PCI bus order, then LUNs within each HBA are detected in driver-specific order that may depend on SAN configuration too. The order within a single HBA might be based on LUN WWIDs, or some other storage configuration details.

And there are no pre-allocated ranges of /dev/sd* names for each HBA: once each HBA's LUNs are assigned names, the system proceeds to the next HBA without leaving any gaps in the /dev/sd* names.

Once a /dev/sd* name has been assigned to a disk or LUN, it cannot be automatically reassigned to point to another LUN while the system is running, or filesystems or databases might get corrupted. Such reassignment while the system is running must always involve the sysadmin. Only at boot time they can be automatically reassigned.

As a result, when the SAN administrator presents a new LUN for your system, its WWID should certainly be unique, but the WWID value might be either before, after, or in the middle of your existing LUNs. When it's hot-added, each path to it will get the next free /dev/sd* device name, so they will go after all the existing LUNs. Even that alone practically guarantees that the ordering of the /dev/sd* names will change the next time the system is booted.

Of course, you could use udev rules to fix the /dev/sda* names to specific HBAs and WWIDs if you wanted... but that is a lot of work for very small gains. All the /dev/sd* devices are supposed perform exactly the same as far as the kernel is concerned: if you find that's not true, you've found a bug and you should report it. Therefore, there is no necessary reason why their ordering should matter.

The Linux kernel developers realized this during the 2.5.* development cycle, as they were trying to remove any limitations from on-line configurability. Now there are ways to make your system configuration completely independent of /dev/sd* names:

  • If you use traditional partitions, you can use /dev/disk/by-* device names instead of /dev/sd* devices, or use the UUID= or LABEL= syntax in /etc/fstab.

  • If you use LVM, it does not even store the device names persistently, but looks for LVM signatures on any disk and partition it can see, and then builds up the configuration dynamically from there. This happens automatically at boot, and every time you run vgscan. (And yes, there are safeguards that prevent LVM mappings to be changed while the the disks are in use.)

  • If you use multipathing, it presents the multipathed LUNs either by their WWIDs (if friendly_names is disabled), by /dev/mapper/mpath* names assigned when each WWID is seen the first time and then stored persistently in /etc/multipath/wwids (or /var/lib/multipath/bindings in RHEL/CentOS 6 and earlier, which turned out to be a mistake if /var is a separate filesystem...), or by aliases you can assign by WWID yourself.

I've once had to administer an old RHEL 3 system that had SAN disks attached to it. Initially it had only one HBA; then another HBA was added for redundancy and SAN migration... but it was from a different vendor, so vendor-specific mulipath solutions were not available. I had to use the (since then abandoned) multipath feature of mdadm. It required keeping track of device names, rather like what you're thinking. Two words: It sucked. I was very happy when that system was finally obsoleted.