Disk IO Errors when writing / Linux + Windows / HDD is OK

hard drivehardware

The problem:
I have a lot of Disk IO errors on my server, these are causing multiple server failures.

  • VMs are rebooting because of IO errors "task xyz/sdaX blocked for more than 120 seconds"
  • Backup not working, because VSS needs to much time.
  • Writing to HDD Disks not possible or transfer is extrem slow with massive retry events
  • Disks are disappearing and stay disappeared until I power cycle the server

Windows: "The IO operation at logical block address X for Disk (2|5|7|8) was retried"

Linux: "Buffer I/O error on dev sdX1, logical block Y, lost async page write"

My Server:

Mainboard: Supermicro XDRi
CPU: 2x E5-2630v3
RAM: 8x32GB DDR4 (8x Samsung M386A4G40DM0)
Disks:
4x WD Red 3TB
2x WD Red 6TB
2x SM863 2TB
1x Intel SSDSC2BX200G4 200GB
1x Samsung 940 Evo - 256GB
OS: Hyper-V 2012 R2
Controller: Onboard Intel C612 | HighPoint Rocket 2720SGL | HighPoint Rocket 640L
Raid: I'm not using any hardware raid - I use MS Storage Spaces, but the described problem occurs even without any software raid.

What I tried:

  • Changing all Sata / SAS cables (2x!)
  • Changing the sata controller (2x!)
  • change the hdd bay slot
  • Tested every single disk at my workstation – no smart / write / read error
  • Reinstalled the host system
  • Installed older / newer driver
  • Updated bios / firmware
  • Reset Bios Settings / Disabled power saving options
  • CPU / RAM Test

I can reproduce the IO errors if I write data to the disks (only hdds – no issues with my ssds) – windows or linux – it does not matter.

Do you have an idea, what that could be?

Best Answer

It seems that the power plug cables were not ok, I changed the power plug cables from the psu to the backplane, now everything is working - I was able to test 1,5Gb/s without a single disk I/O error.

Still can't imagine how this could happen.

Related Topic