FreeBSD VMware and CAM status: SCSI Status Error

freebsdscsistorage-area-networkvmware-esxi

I'm running a FreeBSD 10.1-RELEASE-p19 on a VPS (VMware).

My ISP is experience a rapid data growth, and these messages spontaneous started to show up in our logs a week ago.

Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): SCSI status: Busy
Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): Retrying command
Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): WRITE(10). CDB: 2a 00 03 f9 6c 22 00 00 40 00
Sep 25 09:00:50 srv03 kernel: (da0:mpt0:0:0:0): CAM status: SCSI Status Error

Sometimes the server is totally losing contact with the storage, and then panic and restarts. This often occur every even hour, presumably by a routine job (migration/backup).

Until my ISP have added more storage system, that will lower the load on the storage, I really want to try do something.

I have found this, but are unsure how to patch/use the information:
https://svnweb.freebsd.org/base?view=revision&revision=278111

I also found this (vfs.unmapped_buf_allowed=0), but I'm unsure if this could be related?
https://www.freebsd.org/releases/10.1R/errata.html#open-issues

camcontrol tags da0 -v

(pass1:mpt0:0:0:0): dev_openings  127
(pass1:mpt0:0:0:0): dev_active    0
(pass1:mpt0:0:0:0): devq_openings 127
(pass1:mpt0:0:0:0): devq_queued   0
(pass1:mpt0:0:0:0): held          -1
(pass1:mpt0:0:0:0): mintags       2
(pass1:mpt0:0:0:0): maxtags       255

gstat info when errors occur:
enter image description here

Any thoughts, hints, ideas would be really really really appreciated.

Thanks!

Best Answer

If you are using VMWare, thus mpt(4) is purely virtual, I would suggest changing it to something more simple, like ICH10.

Otherwise I suggest you play with camcontrol tags, either increasing or decreasing queue length.

If you'll chose to reprovision disks using another driver, notice that SAS -> SATA controller change may result in device naming change, probably /dev/daX will become /dev/adaX, so unless you are using zfs or mounting your disks via disk labels, you'll have to edit /etc/fstab.

As about your gstat output - there's clearly something wrong with it, probably to the nature of the virtual environment support in FreeBSD. 600% load is nonsense. I suggest you report this into the FreeBSD Bugzilla.

P.S. The advice to change disk provisioning controller type still stands. P.P.S. Or. Or I would try to lover the queue length of the mpt(4) to 128 or even 64.

Related Topic