Electronic – Heatsink or IC: how to determine root cause of overtemp

heatsinktemperaturethermal

I have a manufacturing situation where we perform a functional test on a board and we are getting frequent overtemperature failures from a BGA package with a heatsink on it. I would like to be able to determine if the cause of overtemperature is because of a bad thermal contact with the heatsink OR if the cause is from the IC itself generating more heat than we expect.

Here's the details:

Large BGA package that dissipates A LOT of power. Very sensitive to heat sink seating
BGA package is a part that is picked by our supplier to meet our specified voltage/power requirements.
There is variation in power dissipation across devices. Unkwown if this variation is caused by heat-sink application or differences between individual IC's. Device has characteristics of thermal run-away? Higher temp and higher current consumption go hand-in-hand (voltage rails are steady).
Heat sink is a copper vapour phase chamber with fins. TIM is a high-performance thermal grease. We have a controlled environment in a chassis with fans forcing air at a constant RPM.
I have a way to measure die temperature of the device to a resolution of 1C. And I can heat-up the device "at will" by running an automated test.

What I would like to do is to perform a test that checks the efficacy of the heat-sink to rule out the heat sink (or TIM or seating) as a problem. One way to do this is to re-apply another "known-good" heat sink and retest, but that is dependent on operator skill for repeatability, and has other manufacturing workflow problems.

Here's an idea for measuring the effectiveness of the heat sink, I'd like to get some input on whether it will be a good idea and/or what would be a better way to test this.

The device has a "textbook" heat-up/cool-down curve that fits nicely to RC-time-constant. In the plot below, I have the device starting at "idle" then I make the device "do its job" in a functional test and then turn off the function after 5 minutes.
I am most interested in the cooling curve because when it starts to cool, I know that the core-part of the IC is no longer generating heat. The cooling curve is just the package cooling down through the heatsink and PCB. I assume that the heatsink dominates the heat transfer especially early on. In other words, the cooling curve is a measure of the cooling performance of the heat sink and not much else. Moreover the other variables across tests have less variation than the heatsink (eg cooling through PCB).
When I normalize the curves to range between zero and one, set the time origin to onset of cooling and look only at the first 80 seconds of cooling, I get nice straight lines in a log plot. Time constant in a cool-running device is 36s with standard deviation <5% over a dozen runs. Time constant in a device where the heat sink has been deliberately impaired to run a few degrees hot was 39s with similar standard deviation.

enter image description here

Now the question if I get a hot-running device and I measure time constant that is the same as a cool-running device, can I rule-out the heat sink and its application as a problem?

I should clarify that this is in a manufacturing context, not design (DVT). The focus is to be able to determine the cause of failures.

Best Answer

Maybe, maybe not, but I'd ask why you are not correlating hot chips with power supply currents, and why you're not putting a temperature sensor on the heatsink. If the thermal path from the die to the heatsink is impaired you'll get a different temperature differential between the die and the heatsink. Likewise, if the chip is drawing more current you should be able to predict the final temperature of the die based on normal thermal behavior. And measuring the heatsink temp doesn't require a dedicated contact sensor: a temporary one will do, or a non-contact IR unit should work, since the emissivity of the heat sinks should be pretty uniform.

As to why the maybes, consider the following model:

schematic

^{simulate this circuit – Schematic created using CircuitLab}

If the thermal resistance from the die to the heatsink is much larger than the thermal resistance of the heatsink to ambient, and the thermal capacity of the die is much less than the capacity of the heat sink (and I would guess both to be true), the latter is the dominant factor in determining the thermal time constant of the heatsink, and thus of the die. In this case, increases in the die/HS thermal resistance will have only small effects on the time constant of the die, but will cause the die to get hotter. You'll have to figure the values for your board to see if this is the case.

Related Solutions

Electronic – TO-220 case-to-sink thermal resistance

Total thermal resistance from junction to air is

Rja = Rjc + Rcs + Rsa.

Where: a = air, j = junction, c = case, s = (heat)sink Rjc & Rsa are reasonably fixed by component choices. Rcs is somewhat more variable - see below.

If your data sheet does not give junction to case thermal resistance (usually Rjc or similar)* then change manufacturers - this is one of the most fundamental thermal parameters and ALL manufacturers 'worth their salt' will supply it. You can look it up for a handful of TO220 packages to get a feel.

Your comment about varying with heatsink suggests you don't really understand the subject. Finding one of the many many good tutorials on the net would be advised.

Thermal resistances are like electrical resistance - they can be added in series.Heat flows from junction to case outer, from case to sink (cia washer and thermal paste, and from heatsink to air. So you get

      Rthermal = Rjc + Rcs + Rsa

      Trise jc = Rjc x Watts etc

      Trise = Tjc + Tcs + Tsa

see prior paragraph for meanings.

Rjc is specified by the device manufacturer.
Rcs is set by thermal washer material, heatsink compound, pressure of mounting etc. It should be a minor contributor.
Rsa is set by heatink design and size, air flow etc.

Temperature rise = Rja x Watts.

Search eg www.digikey.com for TO220 and look at a few Tjc ratings.

This datasheeet

http://ww1.microchip.com/downloads/en/DeviceDoc/22057a.pdf

gives an unusually large number of Rjc and Rja ratings for various package versions of the same part and is a good starting point. For the TO220 package Rjc is 2 C/W.

Rja here is from about 30 to 60 C/W but the heatsinks are "wimpy". Large heatsinks with fans can be around 1 C/W - much less with much care.

A well heatsunk TO220 can manage Tja = 10C/W without too much effort. But it can be far worse without due effort.

______________________

Edited:

The initial paragraph originally read as follows. It is evident that I wrote "case to sink" but intended "junction to case".

If your data sheet dies not give case to sink thermal resistance (usually Rjc or similar) then change manufacturers - this is one of the most fundamental thermal parameters and ALL manufacturers 'worth their salt' will supply it. You can look it up for a handful of TO220 packages to get a feel.

Electronic – LM317 Heat sink, how big

Define:

Tmax = hottest desired case (or heatsink) temperature.
Imax = max current for this design.
Tamb = ambient air temperature
Vin = Voltage from power supply
Vout = Voltage out of regulator.
Tj = junction temperature
Rjc - thermal resistance junction to case.
Rca = Heatsink thermal resistance.
Preg = Regulator power dissipation.

Required minimum heatsink = (Tmax-Tamb)/(Vin-Vout)/ Imax C/W

Junction temperature = (Vin-Vout)x Imax x (Rjc + Rca) + Tamb

Preg = (Vin - Vout) x Imax.

Add a series resistor to reduce regulator dissipation:

Vinreg = Regulator input voltage.
R = Resistor resistance. Pr = Resistor power dissipation. Vdo = regulator dropout volatge

R <= (Vin - Vo_max_with_resistor - Vdo) x Imax.

Pr = Imax^2 x R Vinreg = Vin - (Imax x R) Pvreg = (Vin - Vinreg)x Imax.

 E&OE

More anon if needed.

Your calculations are essentially correct (except as Mark points out, your 42W figures - this appears to be a mental typo - multiply by 0.5, not divide by 0.5).

Don't forget that there is an internal 5 C/W Rjc to allow for.

For limiting case assume junction max allowable is 125C and that internal thermal limiting will occur at that point.

To reduce power dissipation in IC for low Vout use a series resistor.

R <= (Vin - Vo_max_with_resistor - 2) x Imax.

eg For Vout max with a given resistor of say 8V and with 26V in and with I out max with this resistor of 600 mA -

R <= (26-8-2)/0.6 <= 26.666 ohms. Say 27 ohms

At 0.6A it will drop 0.6 x 27 =+ 16 V.
Vin_reg = 26-16 = 10V.
This gives the regulator 2V headroom.

LM317 datasheet says headroom at 600 mA, warm ~= 1.8V (fig 3) so that's just marginal.

Resistor will drop V^2/R = (26-10)^2/27 = 9.5 Watt.

Regulator will drop (10-5) x .6 = 3 Watt.

It's time you got a switching power supply :-).

For interest, under these conditions the internal 5 C/W Rjc will drop 3 x 5 = 15C.
For junction JUST at 125C Tc = 125 - 15 = 110C.
Sizzles with wet finger. Tca = (110-25) = 85C
Heatink needed = 85/3 ~= 25 C/W.

ie a modest heatsink will suffice if you don't Mind boiling water temperatures on the case and heat sink. The resistor will be hot :-).

Best Answer

Related Solutions

Electronic – TO-220 case-to-sink thermal resistance

Electronic – LM317 Heat sink, how big

Related Topic