High-End Thermal Solutions for Network Switches
This technology brief explores the challenges, the development, and the trends in high-end thermal solutions responsible for cooling modern network switches.
Data centers and cloud providers consume a significant amount of electricity, and not all of it is for computational power. Cooling represents around a third of the total consumption and is one of the main issues for electronics. The reliability of electronic components drops by 10% for each increase of 2°C in normal operating temperature.
At the same time, enhanced switching capability is the only answer to the need for network bandwidth expansion. The best merchant silicon available today, offering larger bandwidth and larger buffer sizes, has a TDP (Thermal Design Power) rating of over 400 W, and the trend is to increase in future generations. Not all cooling solutions can offset the amount of heat generated by the main switching chips. The chosen thermal solution must remove the heat from one or more components and transport it to the designated cooling area as fast as possible. The concept may be simple, but it is a very challenging engineering feat.
It is possible to categorize the solutions based on the mechanism used to move the heat. Natural and forced convection are still widely used inside the chassis, but the main chips require dedicated, highly efficient solutions such as two-phase change and liquid cooling.
Main Challenges in Thermal Solutions
In high-end thermal solutions for modern network switches, there is no such thing as one size fits all. Different designs and different hardware requirements demand different thermal solutions. What are the major challenges?
System reliability, ambient temperature, strict acoustics and energy standards are the macro challenges that condition the final thermal solution. These are not controlled at the manufacturer level but still must be taken into consideration during the design process.
At the device level, thermal challenges demand the absolute best from the engineering teams due to the limited internal space available and the known correlation between thermal and switching performance. With the current high density of components per device and the increasing density of transistors per component, the efficacy of heat removal solutions must keep pace.
Adding to that effectiveness challenge is the fact that all components affect the cooling performance. However, some system components aid cooling performance and others hinder it. Under system-level parameters, a proper thermal solution must consider the impact of board placement, cable routing, size and position of venting holes, and heat conduction to the enclosure.
The fans alone have a critical role inside switches, but the way they affect the power supply unit (PSU) really is one of the core design challenges. Powerful fans are a necessity, but they can generate such air displacement that it depletes the PSU of cool air because of high backpressure. Thorough product research and advanced fan control algorithms are essential to keep this delicate balance under control.
Chips with a bigger footprint often display flatness issues that complicate the choice of the thermal interface material (TIM), a crucial element that facilitates the contact between the chip and the heatsink. Moreover, paired with these high-performance switching chips (with an estimated power of 400 W) are high-power pluggable optics that generate significant levels of heat.
However, creative solutions engineers can solve all of the above, the ultimate challenge often being that of meeting specified costs. Now, what is the best way to address these challenges?
Current High-End Thermal Solutions
The heatpipe is a thermal device extensively used in electronics as the primary way to move heat away from critical areas and components to larger heatsinks through perpetual cycles of evaporation/condensation. At its simplest, a heatpipe is a tubular container with a wick structure lining the inside walls and a volatile fluid (water) hermetically sealed inside. When heat is applied (at the evaporator level), the fluid evaporates and migrates to the other end of the tube (the condenser) where it cools down, turns back to the liquid state, and returns to the evaporator via capillarity. The wick’s capillary action allows the heatpipe to perform independently of gravity forces.
Too much heat causes the liquid to reach boiling temperature in the evaporator and the wick, increasing the thermal resistance and undermining the system’s performance. As a function of tube diameter and wick thickness, the power handling capacity (Qmax) of one heatpipe is limited, and to cool down today’s high-TDP chips, multiple heatpipes are required. In modern network switches, heatpipes are a standard thermal solution up to the 300 W mark, which corresponds to switching solutions operating at 6.4 Tbps and below. What about the 400 W and above chips mentioned before?
Vapor Chamber (2D and 3D)
Vapor chambers are industry-favorite, high performance two-phase cooling solutions with a high thermal conductivity that excel in transferring heat horizontally. This technology is ideal for applications with high heat flux, limited headroom, and a uniform temperature profile.
The working principle is also based on the vaporization of water. The vapor then flows throughout the chamber, creating an isothermal heat spreader. After cooling down and reaching the condensation point, the water returns to the side of the chamber with a higher temperature.
In current application scenarios, 2D vapor chambers boast a Qmax of 400 W and play a key role in the new world of 400 Gbps Ethernet.
By adding a third dimension and combining the benefits of heatpipes to transfer the heat on an axial direction, 3D vapor chambers showcase an increased capacity of 50% reaching a Qmax of 600 W. However, this effect is very limited in 1U high switches.
When vapor chambers are not adequate because of performance or dimensions, what is the go-to high-end thermal solution for network switches?
Thermosiphon (or thermposyphon)
The thermosiphon is a tried and tested technology not so different from the heatpipe described above.It consists of the same two main sections: the evaporator, where the working fluid (coolant) absorbs heat, and the condenser, where the working fluid rejects heat. However, a slight nuance made thermosiphons a popular thermal device used in very heat-intensive sectors, such as the automotive industry. What is the difference?
Thermosiphons do not have the internal wick structure, and the working fluid returns from the condenser by gravity to the evaporator. This difference requires the thermosiphon’s boiler to be below its condenser, this allows a greater displacement between both sections as well as a bigger tube diameter for increased capacity.
5 main benefits of thermosiphon technology
- Cooling capacity of over 600W
- High heat flux point
- Flexible design and scalable dimensions
- A single system can cover multiple heat sources
- Built using environmentally friendly materials
Thermosiphons are the current high-end thermal solution for network switches with 12.8 Tbps switching engines that start above 400 W TDP.
Thermosiphons, heatpipes, and vapor chambers are passive cooling devices and only effective around the designed phase-change temperature. Above that temperature, all the fluid turns into gas, unable to condensate – a characteristic of an excellent insulator. What solutions to adopt when switching chips produce more heat than passive solutions can handle?
Next-Gen Thermal Solutions
Integrated, closed loop or open loop liquid cooling systems are the next solutions in line.A moving fluid is more efficient at extracting thermal energy and liquids, in general, are capable of transporting more heat than gases. Liquid cooling is expected to be adopted as the industry moves to powerful switching engines operating at 25.6 Tbps and even 51.2 Tbps, with heat generation to surpass the thresholds of 500 W and 600 W, respectively.
For the same level of performance, the liquid cooling systems have the potential for lower power consumption and lower noise emissions, as it theoretically requires fewer fans or lower fan speeds.
The reasons why liquid cooling systems have not yet taken over the entire field of thermal solutions for modern network switches (and possibly entire data centers) include the higher upfront cost in manufacturing, decrease of internal airflow, reliability of the pump, and the risk associated with leaks.
The most innovative data centers are starting to explore the use of total immersion in a dielectric liquid or coolant as a thermal solution. It drastically reduces cooling costs and the number of moving parts, and allows for higher density data centers than any other method. When cooled in this manner, network switches do not require fans, as the heat exchange occurs throughout the entire pool of liquid and at the surface.
The industry standards for immersion cooling were documented at the OCP Summit for the first time in 2019.