
1. Introduction: The 1.6T Inflection Point
The insatiable appetite of Large Language Models (LLMs) for bandwidth is pushing data center networking into uncharted territory. While 800G deployments are currently ramping up, the industry is already racing toward 1.6T Ethernet. However, this transition is not merely a speed upgrade—it is a collision with physics.
The primary adversary in the 1.6T era is not just bandwidth, but energy efficiency and thermal density. Traditional pluggable optical modules (like OSFP and QSFP-DD) have served us well, but as we approach 200G per lane SerDes speeds, the power consumption of Digital Signal Processors (DSPs) inside these modules is becoming unsustainable.
Why is this critical now?
- Power Budget: Optical interconnects are consuming an increasing slice of the total cluster power budget, threatening to crowd out compute resources.
- Thermal Wall: cooling 51.2T and future 102.4T switches using traditional air-cooled pluggables is approaching the limits of fluid dynamics and thermodynamics.
This article dissects the “Heat + Power” bottleneck and evaluates the competing architectures—CPO (Co-Packaged Optics), LPO (Linear Pluggable Optics), and next-gen Pluggables—while mapping the broader interconnect landscape including PCIe 7.0 and CXL 3.x.
2. The Physics of the Bottleneck: Why 1.6T is Different
2.1 The DSP Power Tax
In a conventional 800G optical module, the DSP is responsible for retiming, equalization, and Forward Error Correction (FEC). While essential for signal integrity, the DSP is a power hog, consuming up to 50% of the module’s power budget.
- At 1.6T: Scaling traditional DSP-based pluggables to 1.6T (e.g., OSFP-XD) is projected to push module power to 25-30W.
- System Impact: For a 51.2T switch with 64 ports, optics alone could consume nearly 2kW. Adding the switch ASIC and fans brings the total thermal load to levels that challenge air cooling capabilities.
2.2 Signal Integrity at 200G/Lane and 128 GT/s
Whether it’s Ethernet (224 Gbps PAM4) or PCIe 7.0 (128 GT/s), moving data at these speeds over copper traces on a PCB is fraught with signal loss. The “skin effect” and dielectric loss mean that signals degrade over mere inches.
- The Reach Limitation: This physical limitation forces retimers to be placed closer to the ASIC, or requires the optics to move inside the switch chassis—the fundamental premise of CPO.
3. The Solutions: CPO, LPO, and the Battle for the Faceplate
Three primary architectures are competing to solve the 1.6T thermal puzzle.
3.1 Co-Packaged Optics (CPO): The Ultimate Integration
CPO represents a paradigm shift. Instead of plugging a module into the front panel, the optical engine is mounted on the same substrate (package) as the switch ASIC.
- Power Savings: By eliminating the long electrical trace across the PCB and the retimers/DSPs required to drive it, CPO can reduce power consumption by 30% to 50% compared to pluggables (targeting ~5-7 pJ/bit).
- Thermal Benefits: Bringing optics closer to the ASIC allows for a shared, high-efficiency cooling solution (often liquid cooling), managing the “hot spots” more effectively.
- Challenges:
- Serviceability: If a laser fails, you can’t just unplug it. You might need to replace the entire switch.
- Ecosystem: It requires a tighter supply chain integration between the ASIC vendor (e.g., Broadcom, Nvidia) and the photonics foundry (e.g., TSMC, Intel).
3.2 Linear Pluggable Optics (LPO): The Pragmatic Middle Ground
LPO removes the DSP from the optical module but keeps the pluggable form factor. It relies on the robust SerDes of the switch ASIC to handle signal equalization (Linear Drive).
- Pros: Significantly lower power and latency than DSP-based pluggables; retains the familiar pluggable serviceability model.
- Cons: Requires excellent channel linearity and limits the reach. It shifts the burden of signal integrity back to the switch ASIC and the PCB design.
3.3 Next-Gen Pluggables (OSFP-XD): The Incumbent
The industry is hesitant to abandon the flexibility of pluggables. OSFP-XD (Extra Density) aims to support 1.6T using advanced DSPs (3nm process) and enhanced thermal designs (heat sinks, liquid cooling cold plates). While power-hungry, it offers the lowest risk path for early 1.6T deployments in 2025-2026.
4. Expanding the Fabric: PCIe 7.0, CXL 3.x, and Memory Pooling
The 1.6T switch bottleneck doesn’t exist in a vacuum. It is the spine of a larger beast: the disaggregated AI cluster. Inside the rack, the connectivity revolution is driven by PCIe 7.0 and CXL 3.x.
4.1 PCIe 7.0: The 128 GT/s Backbone
With the specification targeting release in 2025, PCIe 7.0 doubles the raw data rate to 128 GT/s.
- Relevance to AI: It aligns with 1.6T Ethernet speeds, ensuring that the host-to-NIC bandwidth doesn’t become a bottleneck for the NIC-to-Switch fabric.
- Material Science: Achieving 128 GT/s requires Ultra-Low Loss (ULL) PCB materials and potentially cabled backplanes (flyover cables) to bypass the PCB loss entirely—mirroring the CPO trend of “bypassing the board.”
4.2 CXL 3.x and Memory Pooling
Compute Express Link (CXL) 3.x runs on top of PCIe connectivity to enable cache-coherent memory sharing.
- The Memory Wall: AI training is memory-bound. GPUs often sit idle waiting for data.
- Pooling: CXL 3.x allows a “pool” of DRAM to be shared dynamically across multiple servers in a rack.
- The Fabric Connection: As CXL evolves into a fabric (CXL 3.1+), it will increasingly rely on optical interconnects to scale beyond the rack. The technologies developed for 1.6T CPO (silicon photonics, low-latency DSP-free links) are directly applicable to future optical CXL fabrics.
5. Breaking the ‘Heat + Power’ Bottleneck: A Holistic Approach
Solving the 1.6T challenge requires a convergence of technologies:
- Silicon Photonics (SiPh): Essential for CPO. Manufacturing optical components using standard CMOS processes enables the scale and cost reduction needed for mass adoption.
- Liquid Cooling: Whether Direct-to-Chip (DtC) or Immersion, liquid cooling is becoming mandatory. CPO modules are designed to interface directly with cold plates, removing the thermal resistance of air gaps.
- Fabric Innovation: Moving from rigid leaf-spine architectures to flexible, optical-circuit-switched fabrics or CXL-based memory fabrics to improve utilization.
6. Competitive Landscape and Market Gaps
- Nvidia: Pushing proprietary NVLink/NVSwitch but also adopting standard Ethernet for scale-out (Spectrum-X). Their move to LPO in recent designs signals a focus on power efficiency.
- Broadcom: Leading the CPO charge with the Bailly system, integrating optics with Tomahawk ASICs.
- Marvell: Strong contender in DSP technology, advocating for both optimized pluggables and CPO.
- Market Gap: There is a lack of standardized, multi-vendor CPO solutions. Most current implementations are proprietary. A true “plug-and-play” CPO ecosystem is the “Holy Grail” that the OIF (Optical Internetworking Forum) is striving toward.
7. Conclusion: The Road to 2027
The transition to 1.6T is inevitable, but the path—CPO vs. Pluggable—is a battle of economics vs. physics.
- Short Term (2025-2026): Pluggable optics (OSFP-XD) and LPO will dominate due to supply chain inertia and risk aversion.
- Long Term (2027+): As we aim for 3.2T and beyond, the physics of copper traces and the power density of DSPs will make CPO the only viable option for high-radix AI switches.
For whychip.com readers, the takeaway is clear: The “Heat + Power” bottleneck is the primary driver of innovation in the next decade of networking. Whether it’s CPO, PCIe 7.0, or CXL 3.x, the goal is the same—moving data faster, cooler, and more efficiently to feed the AI brain.
发表回复
要发表评论,您必须先登录。