Addressing TIM Degradation in Lifetime Prediction of Data Center CPU/GPUs: Testing Methodology, Failure Definition and Performance of TIMs

21 views
Download
  • Share
Create Account or Sign In to post comments
#Reliability #Failure Modes #Testing #Electronics #Photonics #SiPho

(32:15 + Q&A) David Huitink, Department of Mechanical Engineering, University of Arkansas
From the 2024 IEEE Symposium on Reliability for Electronics and Photonics Packaging
Summary: Thermal interface materials (TIMs) are crucial for the removal of heat generated by electronic devices. To effectively remove heat from power-electronics, an assortment of TIMs including thermal greases, gels, thermal pads, phase change materials, etc. can be inserted between heat generating components to interface with cooling layers to reduce poorly conducting airgaps and improve heat transfer. As high-power electronics continue to increase in power density while inversely decreasing in size, having proper thermal management in these devices is imperative to mitigating performance degradation, reliability concerns, and failure of the device. Data centers are one of the most critical pieces of modern infrastructure serving the important role of storing, processing, and sending tremendous amounts of data for companies and organizations. These data centers consist of server racks made up of multiple central processing units (CPUs) per rack. Each CPU generates substantial amounts of heat that must be extracted from the server rack to ensure the components do not overheat and begin to degrade. TIMs play an integral role with removing this heat from the CPU, but they suffer significantly from degradation due to the elevated temperatures and frequent power cycling occurring in the server environment. Currently, there is no unified methodology for characterizing TIM degradation in compute platforms, and such information is desperately needed to determine lifetime and maintainability for data centers.
To combat these shortcomings, a refined approach was developed to characterize the impact of performance degradation of the TIM on the lifetime of the CPU. Through an iterative analysis of calculating the increasing TIM thermal resistance caused by degradation and monitoring the rising processor temperature, critical reliability metrics of the CPU can be extracted to determine when the TIM needs replacement to protect the processor from performance deterioration and permanent damage. To demonstrate this strategy, this study focuses on the testing of copper nanowire (CuNW) infused liquid gallium TIM through power cycling and thermal aging to assess the degradation behavior seen through voids and cracks within the TIM. The degradation of the CuNW infused liquid metal TIM is further characterized by a degradation model and applied to the server CPU reliability methodology to demonstrate the accuracy of the strategy.

For additional talks from this REPP, or earlier ones, please visit https://attend.ieee.org/repp

(32:15 + Q&A) David Huitink, Department of Mechanical Engineering, University of Arkansas
From the 2024 IEEE Symposium on Reliability for Electronics and Photonics Packaging
Summary: Thermal interface materials (TIMs) are crucial for the removal of heat generated by electronic devices. To effectively remove heat from power-electronics, an assortment of TIMs including thermal greases, gels, thermal pads, phase change materials, etc. can be inserted between heat generating components to interface with cooling layers to reduce poorly conducting airgaps and improve heat transfer...

Speakers in this video

Advertisment

Advertisment