Linux – Clear machine-check event from /proc/sys/kernel/tainted

linuxmonitoring

Let me start by saying that I assume that the answer is no-way, but here goes …

In monitoring our Linux systems health we check various items
periodically. This includes reading /proc/sys/kernel/tainted and alerting
if the value within is not zero.

Often it looks like this:

  • We read the value 16 out of /proc/sys/kernel/tainted, which indicates
    that a machine-check occurred.
  • Additional digging shows also:

    • /var/log/messages contains:

      Disabling lock debugging due to kernel taint
      Machine check events logged

    • and /var/log/mcelog contains

      Hardware event. This is not a software error.
      MCE 0
      CPU 6 BANK 10 
      MISC 90840800080148c ADDR 89a6a0c40 
      TIME 1383328318 Fri Nov  1 13:51:58 2013
      MCG status:
      MCi status:
      Corrected error
      MCi_MISC register valid
      MCi_ADDR register valid
      MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR
      Transaction: Memory scrubbing error
      STATUS 8c00004a000800c2 MCGSTATUS 0
      MCGCAP 1000c12 APICID 20 SOCKETID 1 
      CPUID Vendor Intel Family 6 Model 45

This seems to me like a soft error, memory sometimes has errors and
self-corrects and other than tracking the number of incidents, the
operator shouldn't have to do anything.

Now the log above notes "Disabling lock debugging due to kernel taint".
This I read to mean that the kernel doesnt want to mislead anyone about
and software component because the hardware is "known to be bad".

This clears if we reboot, but if the number of incidents is low there is
no real reason to reboot we would like to avoid items (not that reboot is
the right course of action for high incidents, fixing the
hardware/environmental issues would make sense, then reboot), but would
still like to use this as a mechanism to alert about potential issues. SO
my question is …

Is there a way to clear this bit out of /proc/sys/kernel/tainted?

p.s. Again, I only intend to reset this if the incidents are infrequent (See here)

Best Answer

I hate to answer my own question, will take anything that adds valuable info

As I mentioned in my comment above, by looking at the kernel code here it is not possible.

It seems reasonable considering the history of "tainted", initially used to indicate that the kernel is impure due to non-compliant (non GPL) modules. This however is a pain as I noted in my other comment, certain memory issues self-correct, and should probably NOT set that taint bit (or at least allow to clear that specific bit).