Electronic – Does cooling the NAND chips on an SSD negatively affect its reliability

heatsinknand-flashssdthermal

The problem of heat dissipation in high-performance, small form-factor SSDs is well-known, for example, the paper Transient Thermal Analysis for M.2 SSD Thermal Throttling published in 2018 17th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems states:

Solid State Drive (SSD) technology continues to advance toward smaller footprints with higher bandwidth and adoption of new I/O interfaces in the PC market segment. Power performance requirements are tightening in the design process to address specific requirement along with the development of SSD technology. To meet this aggressive requirement of performance, one major issue is thermal throttling. As the NAND and ASIC junction temperatures approach their safe operating limits, performance throttling is triggered and thus power consumption would drop accordingly.

Naturally, if space allows, adding a huge heatsink is a possible solution to this problem, there are many products available on the PC gaming market. I also see many M.2 to PCI-E passive adapters on the market have built-in heatsinks by adding a huge copper pour with connection to the ground plane under the M.2 connector.

But one can find many unsourced posts on random computer hardware forums, which claims that the NAND chips should never be cooled. It is claimed that they are actually designed to heat itself up to an optimum operating temperature, and adding a heatsink to the NAND chips adversely affect its reliability. Here's some examples.

One claim reads,

Don't cool the NAND dies themselves!

They heat themselves up to operating temperature by design, cooling them means they just continually dump out power trying to hit temperature, and will be operating with a lower endurance (simplified: higher operating temperature = lower energy input to set/erase cells = less degradation of each cell per write/erase cycle).

Another claim reads,

Cooling the NAND is bad. You want the NAND to run warm and stay warm. As its temperature fluctuates, and as it cools down, if you suddenly transfer a large file (read or write, I can't remember) while the NAND hasn't had time to warm back up first, it can significantly reduce the life of the NAND.

It doesn't sound right to me. It suggest that the NAND chips depend on the self-heating effect to reach an optimum operating temperature, which is something I've never heard before. The only chips that I know that use self-heating are National's LM199/299/399 "Super Zener" voltage references, and Linear Technology's LT1088 Thermal RMS-DC Converter. But I don't believe NAND chips have anything to do with self-heating.

I tried to fact-check and/or debunk these statement, start by looking for a NAND chip datasheet found in some recent SSDs. I went to Digikey and Mouser, set the filter to the highest storage density and sorted them by prices. Unfortunately, it seems that datasheets are not available (all under NDA? I'm looking at the wrong place?).

Are these strange statements have any factual basis?

Best Answer

The paper Influence of temperature of storage, write and read operations on multiple level cells NAND flash memories from 2018 shows the following graph, which suggests that writing to flash cells at a temperature of 25°C or lower results in earlier problems at reading compared to writing at 85°C.

In their discussion they deduce the following reasoning:

Most NAND Flash memories implement the Fowler-Nordheim tunneling effect 1 in order to inject charges through the floating gate [7] during write operation. During write cycles, the programming circuit controls the charge of cells to ensure a sufficient margin of voltage threshold. It is assumed that the writing management circuit probably drifts with low temperatures. Indeed, transistor parameters (threshold voltage and gain) vary with temperature which in turn induces drain current shifts.

And in the conclusion they summarize:

Write operations at low temperatures lead to a decrease in data retention time, probably not due to a degradation of the cell but due to parametric drifts of the die embedded electronics dedicated to write operations.

This suggests why the comment cited in the question might say that.
But in practise I would assume that this effect is not relevant, because a better cooling of the flash will simply give the flash controller more headroom to higher performance while keeping the same temperature (assuming cooling with a traditional heatsink). After seeing the above measurements I would NOT cool my SSD with LN2, though.

enter image description here
https://doi.org/10.1016/j.microrel.2018.06.088