Electronic – At the microscopic level, what exactly forces us into the “dark silicon” (i.e., the mismatch between transistor scaling and voltage scaling) problem

devicepower-dissipationtransistors

I've read that if you fix the TDP of a chip, you cannot use all of the transistors simultaneously any longer. This leads me to believe that each transistor requires the same power as in previous nodes even though they are smaller; i.e., we cannot use less power to switch a smaller transistor on or off.

Why is this the case? What happens at the transistor scale that causes this phenomenon where we require a fixed power level for switching at the transistor gate (even as the transistor becomes smaller)?

Best Answer

I'm going to start my answer by going back to what I was taught at Uni - basically how each of the parameters of the transistor scale - an approach called "Constant Electric Field Scaling".

Lets say we have a transistor, and want to scale it's length \$L\$ and width \$W\$ by, \$\alpha\$ (both are scaled to keep the aspect ratio the same). \$\alpha\$ Could be \$2\$, \$4\$, \$1.23\$, anything really. What happens?

The material isn't changing, so to avoid breakdown, we want to keep the electric field across the transistor the same.

$$E=\frac{V_{ds}}{L}=\frac{V_{ds}'}{(L/\alpha)} \Rightarrow V_{ds}'=\frac{V_{ds}}{\alpha}$$

The drain-source voltage of the transistor must scale - hence voltages go down. You can also say the same for the threshold voltage of the transistor (\$V_t\$) and the gate-source voltage, (\$V_{gs}\$).

Again, for the electric fields to remain the same strength, specifically that across the gate oxide. \$E=\frac{V}{m}\$ so:

$$T_{ox}'=\frac{T_{ox}}{\alpha}$$

So the gate get's thinner! This in turn changes the capacitance of the oxide:

$$\begin{align} C_{ox}&=\epsilon \frac{WL}{T_{ox}} \\ C_{ox}'&=C_{ox}\times\frac{\alpha}{\alpha^2}=\frac{C_{ox}}{\alpha}\\ \end{align} $$

Why does the capacitance matter? Well, we can approximate the saturation current \$I_{d(sat)}\$. We can say:

$$I_{d(sat)}\approx(\frac{V_{sat}C_{ox}}{L})(V_{gs}-V_t)$$

This means we can reasonably assume that:

$$I_{d(sat)}'=I_{d(sat)}\frac{\alpha}{\alpha}\frac{1}{\alpha}=\frac{I_{d(sat)}}{\alpha}$$

I'm not going to go into it, but you can also work out that frequency \$f'=\alpha f\$, hence things can speed up as we scale down.

Now, the dissipated power of each transistor can be approximated as:

$$P=IV=I_d V_{ds}$$

So as the transistor scales:

$$P'=I_{d(sat)}'\times V_{ds}'=\frac{I_{d(sat)}}{\alpha}\times \frac{V_{ds}}{\alpha} = \frac{P}{\alpha^2}$$

Notice how the power dissipation has gone down by the square of \$\alpha\$!

So the power density \$U=P/A\$, will remain constant:

$$U'=U\frac{\alpha^2}{\alpha^2}=U$$


This all looks great, it means we can keep scaling, and increase the number of transistors for the same amount of power, whilst getting faster and faster. Or does it?

The thing is, there is another important consideration. In order to interact with the outside world, and for noise immunity, we can't keep reducing the voltage of the process - notice how in the above, all of the electric fields are kept the same by scaling the voltages. In practice this isn't done directly - the voltages are being scaled much slower than the size of the transistors. If they weren't, then by now CPUs would probably be running at 0.1V logic instead of 0.65V or so. The slightest amount of noise either on signals or power rails would be catastrophic.

In practice, two different scale factors are used, one for size (\$\alpha\$) and one for voltages (\$\kappa\$). The scaling is something like this:

$$ \begin{array}{c|c} Dimension & Scale Factor \\ \hline L, W, T_{ox} & 1/\alpha\\ A & 1/\alpha^2\\ V_{ds}, V_{gs} & 1/\kappa\\ E_{ds}, E_{ox} & \alpha/\kappa\\ C_{ds}, C_{ox} & 1/\alpha\\ I_{d(sat)} & 1/\kappa\\ P & 1/\kappa^2\\ U & \alpha^2/\kappa^2\\ f & \alpha\\ \end{array} $$

From this we can see that because of the two different scale factors, the power density, \$U\$, will go up if \$\alpha\$ is scaled faster than \$\kappa\$ is, which is what is happening in practice.


Furthermore, this is a very simplified overview. It holds quite well if you have very large transistors, but as they get smaller and smaller, it doesn't hold as well as you might hope. Notice how two key factors \$T_{ox}\$ and \$L\$ get smaller?

Well basically this means the barrier between the channel and gate is getting smaller and smaller, as is the distance between drain and source. The gate oxide thickness is now getting so thin, you can comfortably measure it in number of atoms thick! The distance between drain and source getting smaller also means the electric field between drain and source when the transistor is off starts to interact with the barrier created by the electric field of the gate.

Both of these factors mean that the amount of leakage in the transistor - unwanted currents flowing from drain to source, or drain to gate, increase. If leakages go up, power dissipation goes up (and at some point the transistors stop working properly). This leakage is not factored in to the above derivations.

Related Topic