Electronic – Do modern processors have redundancy in their logic units to compensate production faults


Modern processors consist of billions of transistors and new production technologies often have problems with the yield, at least in the first months, but I guess that even after years there will be faulty chips every now and then.

I know that in large blocks (e.g. the cache) there is the possiblity to just disable parts of it and by that reducing the available amount of memory (so you can at least sell the chip at a lower price instead of throwing it away).
But is there something similar for the logic units? I am aware that there are multiple ALUs for dispetching, but is this a thing to just disable one of them if there is a production fault? Or are there even additional spare ALUs? Because for me it is hard to believe that fabs just dispose of every chip where there is a faulty transistor in the logic parts, while disabling a complete ALU would proberly reduce the achievable processing power significantly.

Best Answer

As others have said, it is difficult to see redundant ALU logic within a core.

A core was designed to optimize throughput. Any additional logic for a redundant ALU would impact performance and increased area would slow down the whole core. As technology evolved, the silicon became smaller, making cores faster, but essentially using the same intellectual property. Why have redundant ALU's, when space is available for redundant cores to increase production yields?

In 2011, Intel filed a patent for at least 32 cores with 16 active and 16 spare. The patent states failing cores would have higher temperatures allowing a spare core to be switched in. Essentially, dynamic core allocation as required.

You could have high-power and low-power cores allocated as required by tasks. Or switch out a bad core detected by higher temperature levels. Operate the cores in a checkerboard manner to reduce heat.

Intel Patent: Enhancing Reliability of a Many-Core Processor