DRAM, as you said, basically consists of a storage capacitor and a transistor to access the voltage stored on that capacitor. Ideally, the charge stored on that capacitor would never decrease, but there are leakage components that allow the charge to bleed off. If enough charge bleeds off the capacitor, then the data cannot be recovered. In normal operation, this loss of data is avoided by periodically refreshing the charge in the capacitor. This is why it is called Dynamic RAM.
Decreasing the temperature does a few things:
- It increases the threshold voltages of MOSFETs and the forward voltage drop of diodes.
- It decreases the leakage component of MOSFETs and diodes
- It improves the on-state performance of the MOSFETs
Considering that the first two points directly reduce the leakage current seen by the transistors, it should be less of a surprise that the charge stored in a DRAM bit can last long enough for a careful reboot process. Once power is reapplied, the internal DRAM system will maintain the stored values.
These basic premises can be applied to many different circuits, such as microcontrollers or even discrete circuits, as long as there isn't an initialization on start-up. Many microcontrollers, for example, will reset several registers on start-up, whether the previous contents were preserved or not. Large memory arrays are not likely to be initialized, but control registers are much more likely to have a reset on start-up function.
If you increase the temperature of the die hot enough, you can create the opposite effect, of having the charge decay so rapidly that the data is erased before the refresh cycle can maintain the data. However, this should not happen over the specified temperature range. Heating the memory hot enough for the data to decay faster than the refresh cycle could also cause the circuit to slow down to the point where it couldn't maintain the specified memory timings, which would appear as a different error.
This is not related to bit-rot. Bit-rot is either the physical degradation of storage media (CD, magnetic tapes, punch cards) or an event causing the memory to become corrupted, such as an ion impact.
Yes, it is correct. I did not check the numbers, but the procedure is correct.
To help you understand what you are doing, here is an explanation of the formula. When you perform a read operation, you turn on the NMOS. This means that you somehow shortcircuit Cc and the bit line. These are two capacitors charged at different voltages, so when you connect them a redistribution of charges takes place. The formula that you are using is actually about charge conservation, and not energy conservation. The charge Q stored in a capacitor equals its capacity times the voltage across its leads:
$$ Q\ =\ C \cdot V$$
So what you are doing is just match the charge present before the connection, and the charge present after, keeping in mind that after the short circuit we can model the Cc-BL capacitor as the parallel of the two.
$$Q_i = Q_f \\
C_C\cdot V_{C_C}+C_{BL}\cdot V_{C_{BL}} = C_{tot}\cdot V_f$$
The final voltage must be greater than \$V_{C_{BL}}+\Delta V\$ if we had a high level and less than \$V_{C_{BL}}-\Delta V\$ if we had a low level. Using this condition will lead to two separate equations that are easily solved:
$$
C_{C(1)} \ge \frac{C_{BL}\cdot \Delta V}{V_{C_C(1)}-V_{BL}- \Delta V} \\
C_{C(0)} \ge \frac{-C_{BL}\cdot \Delta V}{V_{C_C(0)}-V_{BL}+ \Delta V}
$$
The higher value of capacitance is the one you are looking for.
Best Answer
The most significant factor is the physical (die) size of the transistor geometries (smaller means less SEU energy required to trigger them) and then the number of them (more devices/area = higher susceptibility). So really, reliability per bit is more related to how many bits per silicon area.
If reliability is a concern, always include ECC and design the system & software for good error checking & graceful error handling.