I have a FreeBSD with a RAID running 3 Hard Disks.
I was given the challenge to monitor its RAID – If a hard disk fails or it presents some problem i need to know.
So, the first thing i am doing right now is trying to understand how SmartCTL works…
The commands i used so far are:
smartctl --scan -j - To scan my devices and generated a JSON Structured list.
smartctl -i /dev/device_name - To list informations about a single device
smartctl -a /dev/your-device - More information like errors and etc (I think i can use this in some way to grep only errors sections...)
Is there any other parameters of SmartCTL that checks if the disks are writable, alive, and have their health status OK?
The main purpose of this understanding is to use mainly SmartCTL to generate data that will be used in a template of pfSense RAID Monitoring with Low Level Discovery for Zabbix Monitoring Software…
Any help is appreciate it.
Best Answer
ok so to answer - there's no
smartctl --isDiskOK /dev/sda
approach you can take as far as I know.Closest smartctl can take you to this is the report of reallocated sector count. I'm not going to explain in detail but essentially when disks starts to fail the hard to read/write sectors will be relocated to a spare sector. If there's more reallocations happening it means drive is closer to be completely dead.
create the zabbix template with following item:
allow user zabbix to execute smartctl via sudoers:
assign template to monitored host + restart zabbix_agent to load config - your item should populate with reallocated sector count
Profit.. :)
anyways the zabbix template is available on my GitHub too https://github.com/RipperSK/zabbix-user-params/blob/master/hdd.reallocated.sectors/userparameter_hdd.reallocated.sectors.conf
enjoy.