Linux – reliably and automatically determine connection path of physical position of HDD from /dev/sdX device file

drive-failurehardwarelinuxmdadm

This is kind of a FAQ, but all the answers I found so far are not suited for full automation, which is what I need. So here it goes again.

In Linux:

Is there a reliable way to resolve the udev device name of a HDD (e.g. "/dev/sdg") into it's data path to identify the physical real-world cable the device is connected through (E.G. "Controller in PCIe-Slot2, SAS-Channel 0, Replicator Port 3"?

I run a server with a dozen SATA-disks in hot-swap backplanes. The disks are assembled into a software Raid6 using dmraid (Linux). For reasons beyond the scope of this question I want and need to run software raid, not hardware raid through a dedicated controller.

One of the shortcomings of software raid is that the failure LED on the drive bay doesn't come on when a drive fails in an array, because the enclosure has no way to poll mdadm for drive status. You have to find the position of the faulty drive manually.

I know you could just issue a dd if=/dev/sdg of=/dev/null and see which activity LED comes on, but I'm aiming for the pretty solution here.

To fix that I hacked together a little PCB that will talk to the backplane via i2c to switch on/off the failure LEDs of the bays and I have a little script that talks to this board via RS232.

mdadm can run a command when a failure event occurs, so I can tell mdadm to run my script and turn on the LED when a drive drops out of the array. The only problem is:

mdadm tells me "drive /dev/sdg1 has failed". But what I need is "Drive on Controller 1, Channel 2, Port 3 has failed", so I can identify which LED to switch on.

Does anyone know a reliable way to resolve a device name like /dev/sdg back to the path?

I know hdparm -I /dev/sdX will give me the serial number and vendor of the drive so I can manually identify the disk by looking at the label, but the point is to do this automatically. Reliably identifying the controller/port involved will suffice, since the wiring usually won't change when using backplanes and I know what controller port services what drive bay.

My first idea was to do ls -lah /dev/disk/by-path | grep /dev/sdX for the appropriate target name, however this proved unreliable, as not even half of the disks currently installed appear in that directory.

Just saying "your first controller will have sda-sdh, your second controller will have sdi-p" is also unreliable, since there's a race condition at boot and sometimes the one controller is initialized first, sometimes the other. Whichever is initialized first gets /dev/sda… Also things get complicated after a hot swap or if not all bays are populated.

lshw -short -c disk seems to generate an output similar to what I'm looking for, but I'm having trouble linking the path numbers shown there to the physical cables. Are these assignments stable? What is the exact pattern? (The numbering in the scsi paths is continuous even though not all my drive bays are populated, that leads me to think these scsi path nodes are dynamically assigned, and don't directly represent physical ports on the controller)

Does anyone know a linux command which takes a /dev/sdX as argument and reliably outputs the associated PCIe-slot and controller port?

Best Answer

As a previous comment mentioned, check out /sys/block/sdX. If you cd to one of those directories, and then do ls -l, you'll see a symlink for device, which should point you to the path of the device. For example, on my system I see device in /sys/block/sdz linked as follows:
device -> ../../devices/pci0000:00/0000:00:09.0/0000:08:00.0/host3/rport-3:0-5/target3:0:5/3:0:5:1/

This gives me the exact path to the device that backs the block device, from PCI path to host controller to LUN. This may look a little different depending on the type of device you have - for example, mine is a fiber channel controller, hence the rport portion of it.