WhyChips

A professional platform focused on electronic component information and knowledge sharing.

Thermal Wall Solutions: CoWoS/SoIC with Direct Liquid Cooling

Red glowing chip on circuit board, thermal highlights, electronic technology, semiconductor design

Introduction: AI Computing’s Heat Challenge

Advanced packaging technologies must rapidly evolve as AI accelerators exceed 1000W per chip, pushing traditional air-cooling beyond its limits. The integration of CoWoS (Chip-on-Wafer-on-Substrate) and SoIC (System-on-Integrated-Chips) with direct liquid cooling marks a fundamental shift in semiconductor thermal management.

Understanding the Thermal Wall

The “thermal wall” describes physical limitations in dissipating heat from densely packaged devices. CoWoS and SoIC enable higher integration and performance but concentrate heat in smaller volumes, creating severe thermal challenges.

Key thermal wall factors:

  • Power Density Escalation: AI accelerators consume 700W to over 1000W, with higher levels ahead
  • Vertical Integration: 3D stacking via SoIC creates difficult-to-manage vertical thermal pathways
  • Thermal Resistance Accumulation: Each interface from die to cooling solution adds resistance
  • Hotspot Formation: Heterogeneous integration creates localized hotspots exceeding average chip temperatures

CoWoS Technology and Thermal Challenges

CoWoS dominates advanced packaging for high-performance computing and AI. TSMC’s technology integrates multiple chiplets on silicon interposers, delivering superior electrical performance but introducing thermal challenges.

CoWoS thermal characteristics:

  • Interposer Thermal Properties: Silicon interposers provide good lateral conductivity (~150 W/mK) but increase vertical thermal resistance
  • Multi-die Configuration: HBM stacks alongside logic dies create complex thermal interactions
  • Large Package Size: Packages exceeding 2000mm² require uniform cooling to prevent thermal gradients
  • Underfill Materials: Low thermal conductivity (0.3-1.0 W/mK) creates bottlenecks

TSMC is aggressively expanding CoWoS capacity for AI chip demand, with next-generation CoWoS-L supporting larger reticle sizes.

SoIC: Next Frontier in Advanced Packaging

SoIC enables chip-on-chip stacking with fine pitch bonding. Unlike CoWoS’s interposer approach, SoIC directly bonds chips via hybrid bonding, achieving connection pitches of 10μm or less.

SoIC thermal considerations:

  • Direct Bonding Interface: Hybrid bonding creates intimate thermal connections with lower interface resistance
  • Vertical Heat Flow: 3D stacking concentrates heat, requiring efficient vertical extraction
  • Power Distribution: Active dies must manage power delivery while maintaining thermal budget
  • Backside Heat Extraction: Double-sided cooling becomes critical for extracting heat from stacked dies

Combining SoIC with other technologies creates new possibilities but compounds thermal challenges. Future products may integrate CoWoS and SoIC, requiring sophisticated thermal strategies.

Thermal Resistance Metrics

Thermal resistance (°C/W or K/W) quantifies heat flow through materials and interfaces. Reducing total resistance from junction to ambient maintains acceptable die temperatures.

Typical thermal resistance values:

  • Die to TIM interface: 0.05-0.20 °C/W
  • TIM layer: 0.05-0.15 °C/W
  • Heat spreader/cold plate: 0.10-0.30 °C/W
  • Cold plate to coolant: 0.05-0.15 °C/W

For a 1000W chip with 0.10 °C/W total resistance, temperature rise from junction to coolant is 100°C. At 30°C coolant temperature, junction reaches 130°C. Reducing resistance by 0.02 °C/W saves 20°C, significantly impacting reliability and performance.

Evolution of Liquid Cooling

As air cooling reaches limits, liquid cooling has become essential for high-power AI accelerators. Superior thermal properties of liquids make them necessary for next-generation systems.

Liquid cooling advantages:

  • Higher Heat Capacity: Water has ~4000 times air’s volumetric heat capacity
  • Better Heat Transfer: Achieves 10,000-100,000 W/m²K versus 100-1000 W/m²K for air
  • Reduced Noise: Operates more quietly than high-speed fans
  • Compact Design: Enables higher rack density with less space for air circulation

Direct-to-Chip Liquid Cooling: The Game Changer

Direct-to-chip liquid cooling (DLC) places cooling solutions directly on chip packages, eliminating intermediate interfaces. This minimizes thermal resistance for highest power density chips.

Key features:

  • Cold Plate Design: Microchannel cold plates optimize heat transfer while minimizing pressure drop
  • TIM Selection: High-performance TIMs (>5 W/mK) minimize package-to-cold plate resistance
  • Coolant Selection: Water-based coolants with additives are most common; some use dielectric fluids
  • Flow Rate Optimization: Balances flow rate, pressure drop, and pump power for efficiency

Leading GPU manufacturers have adopted direct liquid cooling for top products, impacting the entire supply chain.

Material Innovations Supporting Advanced Thermal Management

Thermal management effectiveness depends on materials throughout the thermal path. Recent innovations in TIMs, substrates, and packaging materials enable better performance.

Critical materials:

  • Advanced TIMs: Indium solders, graphite, and metal matrix composites offer 5-50+ W/mK conductivity
  • High-Conductivity Substrates: Silicon interposers, copper substrates, and advanced organics reduce package thermal resistance
  • Thermal Spreaders: Vapor chambers and copper spreaders distribute heat laterally
  • Underfill Materials: Thermally enhanced underfills reduce die-to-interposer bottlenecks

Material selection balances thermal performance with mechanical reliability, CTE matching, and manufacturing compatibility.

Capacity Constraints and Supply Chain Implications

Rapid advanced packaging adoption has created supply chain capacity constraints. CoWoS capacity limits AI accelerator production, with TSMC expanding capabilities.

Supply chain considerations:

  • Long Lead Times: Advanced packaging equipment has 12-18+ month lead times, slowing expansion
  • Material Availability: High-performance TIMs and large silicon interposers face supply constraints
  • Testing and Qualification: Known good die testing requires sophisticated equipment with limited capacity
  • Integration Complexity: Combining CoWoS + SoIC + liquid cooling needs new processes and equipment

These constraints influence product roadmaps, with companies securing capacity commitments and building internal capabilities.

System-Level Thermal Design for AI Infrastructure

Thermal challenges extend beyond chip and package to system and facility levels. Data centers with thousands of AI accelerators need comprehensive thermal strategies.

System-level considerations:

  • Rack-Level Distribution: Manifolds must deliver coolant efficiently to multiple devices
  • Heat Rejection: Facilities must handle megawatts of thermal load
  • Redundancy: Cooling failures cause thermal shutdown, requiring backup systems and monitoring
  • Energy Efficiency: Cooling consumes 20-40% of facility power, impacting operational costs

Future Trends and Innovations

The industry continues developing next-generation thermal solutions for increasing power densities.

Emerging technologies:

  • Embedded Cooling: Microfluidic channels in silicon interposers or substrates for ultra-low resistance
  • Two-Phase Cooling: Phase change heat transfer for higher heat flux
  • Advanced Cold Plates: Jet impingement and enhanced microchannels improve heat transfer
  • Backside Power Delivery: Reduces frontside congestion, enables efficient thermal extraction
  • AI-Optimized Design: Machine learning optimizes thermal management by workload

Industry Collaboration and Standardization

Addressing thermal challenges requires ecosystem collaboration. Industry organizations are establishing standards for liquid cooling interfaces, thermal testing, and reliability qualification.

Key initiatives include standard cold plate interfaces, coolant specifications, and thermal testing protocols. These efforts reduce complexity and enable broader liquid cooling adoption.

Conclusion: A Holistic Approach to Breaking the Thermal Wall

Breaking the thermal wall requires comprehensive integration of packaging innovation, cooling solutions, materials, and system optimization. CoWoS and SoIC expansion with direct liquid cooling represents critical evolution in thermal management.

Success depends on collaboration between chip designers, packaging providers, materials suppliers, cooling vendors, and system integrators. As AI computing advances, thermal management remains a key differentiator.

The industry has shown remarkable innovation addressing thermal challenges. Companies effectively integrating advanced packaging with cutting-edge thermal management will best deliver high-performance systems for AI applications.

发表回复