My hosting provider has inserted a hard drive into my server which seems to have had some sort of error in the past but a full offline smart check showed that everything is (about) ok at the moment. The server has a RAID1 so I can somewhat live with that situation.
Problem is that (according to the man page) smartctl sets bit no 6 if there was an error in the past, so now while everything is alright, the exit code is numeric 64.
The smart plugin is configured by default to have a threshold of 0, and while I know I could set the threshold up to 64, I would miss out on the much more important bit 3 "disk is failing".
Is there a way to set up a threshold in a way so that munin does bitwise comparison of the value?
Best Answer
Eventually I have resorted to patching the smart plugin. Depending on your version there is some code like this:
replace it with this
The most interesting part is the line where there is a bitwise operation with 191: this is 0x11011111 in binary, so doing an AND operation with the current value it will just set bit no 6 to 0 while letting the other values untouched.
Therefore a value of 64 (as mine does) will be reported as 0 while a value of 8 would remain at 8. But also, very importantly, a value of 72 (bit 6 set as always and bit 3 set because the disk is failing) it would also report 8.