Let me start by saying that I assume that the answer is no-way, but here goes …
In monitoring our Linux systems health we check various items
periodically. This includes reading /proc/sys/kernel/tainted
and alerting
if the value within is not zero.
Often it looks like this:
- We read the value 16 out of /proc/sys/kernel/tainted, which indicates
that a machine-check occurred. -
Additional digging shows also:
-
/var/log/messages
contains:Disabling lock debugging due to kernel taint
Machine check events logged
-
and
/var/log/mcelog
contains
-
Hardware event. This is not a software error. MCE 0 CPU 6 BANK 10 MISC 90840800080148c ADDR 89a6a0c40 TIME 1383328318 Fri Nov 1 13:51:58 2013 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error STATUS 8c00004a000800c2 MCGSTATUS 0 MCGCAP 1000c12 APICID 20 SOCKETID 1 CPUID Vendor Intel Family 6 Model 45
This seems to me like a soft error, memory sometimes has errors and
self-corrects and other than tracking the number of incidents, the
operator shouldn't have to do anything.
Now the log above notes "Disabling lock debugging due to kernel taint".
This I read to mean that the kernel doesnt want to mislead anyone about
and software component because the hardware is "known to be bad".
This clears if we reboot, but if the number of incidents is low there is
no real reason to reboot we would like to avoid items (not that reboot is
the right course of action for high incidents, fixing the
hardware/environmental issues would make sense, then reboot), but would
still like to use this as a mechanism to alert about potential issues. SO
my question is …
Is there a way to clear this bit out of /proc/sys/kernel/tainted?
p.s. Again, I only intend to reset this if the incidents are infrequent (See here)
Best Answer
I hate to answer my own question, will take anything that adds valuable info
As I mentioned in my comment above, by looking at the kernel code here it is not possible.
It seems reasonable considering the history of "tainted", initially used to indicate that the kernel is impure due to non-compliant (non GPL) modules. This however is a pain as I noted in my other comment, certain memory issues self-correct, and should probably NOT set that taint bit (or at least allow to clear that specific bit).