Electrical – CPU-Multiplier Overclocking causing (silent) data corruption

cpudata storageoverclockingsdram

There have been a large number of threads regarding CPU overclocking and damage to electrical components in the past. Mostly, however, these questions were directed at how increased voltage and the resulting heat can cause component failure and reduced lifetime, if not cooled properly.

Since FSB-overclocking to reach higher performance is not recommended today, if a free CPU-multiplier is available, because the resulting increase of the SATA-Bus clock rate can cause hard-drive issues and the actual memory clock is more difficult to control, I was wondering, if overclocking the CPU-multiplier could cause any similar issues.

Is there any logical scenario (or even research done) in which a CPU that was overclocked solely by increasing the multiplier could produce (silently) corrupted data? I'm not concerned with corrupt data because of a sudden thermal shutdown at an extreme overclock with insufficient cooling, but rather with the possibility of long-term silent data corruption of files on the drives or in the memory, because of unnoticed hardware malfunction, even when running the cpu slightly above specification. (E.g. a 10% multiplier increase without raising voltages, etc.)

Is this even a viable danger, given how CPU-multiplier overclocking works and are there any sources on this topic?

I noticed that it is difficult to find sound information on the potential effects of hardware overclocking, besides of subjective opinions and recommendations. ("I wouldn't run hardware outside of manufacturer specifications, if reliability is demanded", etc.)

Best Answer

Running any device beyond its rated specification means you have no guarantees of said device performing as intended.

Whether or not it fails (death, data corruption, dodgy calculations, brewing coffee instead of tea, etc.) depends as much on the direction of the wind as anything else. You are the proverbial guinea pig.

If the manufacturer hasn't guaranteed the specifications beyond a certain point, it is usually because either they haven't tested it under said conditions, or because the mean time before failure (MTBF) is too low for those conditions. MTBF is essentially the average time before a failure is expected - it doesn't mean all devices will survive to that point or die after that point, just that they will on average.


In terms of what effects you might see when overclocking, well, pretty much anything.

When chips run hotter, or are clocked faster, propagation delays through the chip start to take up a higher portion of the clock period. If these delays become too long you end up with data setup violations - basically where data signals arrive at registers after the clock that was supposed to clock them.

Setup violations cause all sorts of random effects. Calculations start returning incorrect values as bits in the data paths get misread. Instructions execution might be incorrect if there are violations in the instruction pipeline. Your comparison instruction that determines whether or not to launch the worlds nuclear arsenal might be miscalculated. Your precious homework report might get corrupted as data is passed to storage controllers incorrectly.

There is really no telling what will happen. And more problematically it might happen on one chip at a certain frequency, but not on another due to process variation.