WhyChips

A professional platform focused on electronic component information and knowledge sharing.

PMIC for AI Servers: Multiphase Buck, Telemetry & Health

Panoramic view of an AI server room showcasing a multiphase PMIC power board with glowing telemetry streams, health monitoring interfaces, and rack-mounted AI compute nodes, representing reliable data center power management.

In the rapidly evolving landscape of AI infrastructure, a quiet revolution is unfolding inside every server rack. Power Management Integrated Circuits (PMICs) — once regarded as mundane support components — have ascended to mission-critical status. As AI accelerators like NVIDIA’s Blackwell GPUs push per-chip power consumption beyond 1,000 watts, the PMIC is no longer a backstage player. It is the gatekeeper of system reliability, efficiency, and uptime.

This article explores the multiphase buck converter architectures, real-time telemetry, and health monitoring capabilities that define the modern PMIC. We examine why these features matter for AI servers, which vendors are leading the charge, and how isolated ADC/DAC subsystems and PMBus telemetry are becoming indispensable in next-generation data centers.


Why PMICs Matter More Than Ever in AI Servers

The Power Density Problem

AI workloads consume up to six times more power than traditional computing workloads per rack. A single NVIDIA HGX B200 board can draw over 2,700 watts, and a fully loaded AI rack may exceed 100 kW. This density creates enormous challenges for voltage regulation: sub-1V core rails must be delivered with millivolt-level accuracy under extreme transient loads.

Traditional discrete voltage regulators simply cannot keep pace. Modern PMICs integrate multiple regulated output rails, power sequencing logic, fault protection, and digital telemetry into a single package — dramatically reducing board area while improving transient response and thermal management.

From Supporting Role to Center Stage

Historically, PMICs were selected late in the design cycle and treated as commodity parts. Today, system architects at hyperscale data centers evaluate PMIC capabilities before finalizing their processor and accelerator choices. The reason is straightforward: a PMIC failure can take down an entire GPU node worth millions of dollars in compute capacity. Real-time health monitoring and predictive telemetry are no longer optional — they are requirements driven by the economics of AI infrastructure.


Multiphase Buck Converters: The Engine Behind High-Current Delivery

What Is a Multiphase Buck Converter?

A multiphase buck converter splits the power conversion task across multiple parallel switching phases. Instead of a single power stage handling the full load current, two, four, eight, or even sixteen phases share the burden. Each phase operates with a fixed phase offset, producing interleaved switching waveforms that reduce output ripple, lower thermal stress per component, and dramatically improve transient response.

Why Multiphase Is Essential for AI Power Rails

Modern AI processors demand core currents exceeding 1,000 amps at voltages below 0.8V. Delivering this level of current from a single phase would require impossibly large inductors and MOSFETs. Multiphase architectures solve this by distributing current across many smaller, faster switching stages.

Key benefits include:

  • Reduced output ripple: Interleaved phases cancel ripple current, enabling tighter voltage regulation without excessive output capacitance.
  • Improved transient response: When a GPU transitions from idle to full load in microseconds, multiple phases can respond simultaneously, preventing dangerous voltage droops.
  • Thermal distribution: Spreading power dissipation across multiple phases and smart power stages (SPS) prevents hotspot formation on densely packed server boards.
  • Phase shedding for efficiency: At light loads, the controller can disable unnecessary phases to reduce switching losses, a feature critical for improving data center Power Usage Effectiveness (PUE).

Leading Multiphase Solutions in the Market

Renesas offers a comprehensive portfolio of digital multiphase controllers and smart power stages. Their controllers support dual-loop, 3-loop, and 4-loop architectures, with native compliance for Intel VR14.Cloud and AMD SVI3 protocols. Renesas smart power stages feature high-accuracy per-phase current and temperature monitoring, auto phase add/drop, and switching frequencies up to 2 MHz.

Monolithic Power Systems (MPS) has gained significant traction with its MP29xx series, deployed extensively in GPU clusters powering NVIDIA’s Blackwell architecture. MPS PMICs integrate I2C and PMBus digital interfaces alongside multi-phase parallel capability, enabling design scalability across different power levels.

Infineon Technologies provides PMBus-compliant SupIRBuck voltage regulators that offer programmable switching frequency, current limits, and comprehensive telemetry via PMBus commands. Their IRPS series supports runtime control, fault status reporting, and configuration storage in internal memory.

Microchip Technology announced the MCP16701 PMIC in April 2025, integrating eight 1.5A buck converters that can be paralleled, four 300 mA LDOs, and a controller for external MOSFETs — targeting high-performance MPU and FPGA applications in AI and data center environments.


Telemetry: The Eyes and Ears of Modern PMICs

What Is PMIC Telemetry?

Telemetry in the context of PMICs refers to the real-time reporting of electrical and thermal parameters from the power management subsystem to the host processor or baseboard management controller (BMC). This data stream typically includes:

  • Output voltage (per rail)
  • Output current (per phase and total)
  • Input voltage and current
  • Junction temperature of the PMIC and power stages
  • Switching frequency
  • Fault and warning status flags

PMBus: The Industry-Standard Telemetry Protocol

The PMBus (Power Management Bus) protocol, built on the SMBus/I2C physical layer, has become the de facto standard for PMIC telemetry in servers. PMBus defines a comprehensive command set for reading telemetry data, configuring operating parameters, and managing fault responses.

PMBus-enabled PMICs allow the BMC to:

  1. Monitor real-time voltage, current, and temperature across all power rails.
  2. Configure output voltage levels, switching frequency, current limits, and fault thresholds without hardware changes.
  3. Log fault events and environmental data for post-mortem analysis.
  4. Control power sequencing, phase management, and dynamic voltage scaling.

Telemetry Resolution: Why 12-Bit ADCs Matter

The accuracy of PMIC telemetry depends directly on the resolution of the internal analog-to-digital converters (ADCs). A 12-bit telemetry ADC, as featured in devices like the Renesas ISL91302B, provides 4,096 discrete measurement levels — enabling millivolt and milliamp resolution that is essential for detecting subtle degradation trends before they become critical failures.

Higher telemetry resolution enables:

  • Precision current balancing across phases, preventing individual phase overload.
  • Early detection of component aging, such as capacitor ESR drift or inductor saturation.
  • Accurate power accounting for billing and efficiency optimization in multi-tenant cloud environments.

Health Monitoring: Predictive Maintenance for Power Subsystems

Beyond Simple Fault Detection

Traditional PMICs offered basic fault protection: overvoltage (OVP), undervoltage (UVP), overcurrent (OCP), and over-temperature (OTP). When a fault occurred, the PMIC would shut down the affected rail and assert a fault flag. This reactive approach is no longer sufficient for AI infrastructure where unplanned downtime costs thousands of dollars per minute.

Modern PMIC health monitoring goes far beyond binary fault detection. It encompasses:

  • Trending analysis: Continuous telemetry data allows system software to track parameter drift over time. A gradual increase in junction temperature or a slow decrease in output voltage margin may indicate impending failure weeks before a hard fault occurs.
  • Predictive maintenance integration: Telemetry data feeds into data center infrastructure management (DCIM) platforms that use machine learning models to predict component failures and schedule proactive replacements during planned maintenance windows.
  • Per-phase diagnostics: Advanced multiphase controllers report individual phase health metrics, enabling identification of a degraded phase without shutting down the entire rail.

Real-Time Monitoring in Practice

Consider a typical AI server with eight GPU modules, each powered by a multiphase PMIC delivering 500A at 0.75V. The BMC polls each PMIC via PMBus at one-second intervals, collecting voltage, current, and temperature readings. This data is aggregated and transmitted to a centralized monitoring platform.

If one phase in a 16-phase converter begins drawing disproportionately high current — perhaps due to a degraded MOSFET or solder joint — the telemetry system flags the anomaly. The phase can be shed automatically while the server continues operating at reduced but stable power. A maintenance ticket is generated, and the affected board is replaced during the next scheduled window — without any unplanned downtime.


The Role of Isolation and ADC/DAC in Power Telemetry

Why Isolation Matters

In high-power AI server environments, the power delivery network operates at voltage levels ranging from 48V bus distribution to sub-1V core rails. Some next-generation architectures are moving to 800V DC distribution within the data center, as highlighted by Analog Devices’ work on 800V hot swap protection and telemetry systems.

Galvanic isolation between the high-voltage power domain and the low-voltage digital telemetry domain is essential for:

  • Safety: Protecting sensitive digital circuits and human operators from high-voltage faults.
  • Signal integrity: Preventing ground loops and common-mode noise from corrupting telemetry measurements.
  • System reliability: Ensuring that a fault in the power domain does not propagate to the management controller.

Isolated ADCs and DACs serve as the bridge between these domains. An isolated ADC digitizes high-side current and voltage measurements and transmits the data across the isolation barrier using capacitive or magnetic coupling. An isolated DAC can be used to set reference voltages or trim points on the power stage from the digital control domain.

ADC/DAC Integration Trends

The trend in modern PMICs is toward tighter integration of high-resolution ADCs within the power management IC itself, eliminating the need for external measurement circuits. This integration reduces board space, improves measurement accuracy by shortening analog signal paths, and lowers system cost.

For applications requiring galvanic isolation, dedicated isolated ADC products from vendors like Analog Devices (AD7400 series) and Texas Instruments (AMC1306 series) are commonly paired with PMICs to provide isolated current and voltage sensing on high-voltage rails.


How Does PMIC Telemetry Support AI Workload Optimization?

Beyond hardware protection, PMIC telemetry data is increasingly used for workload-aware power management. Modern AI orchestration platforms can use real-time power consumption data to:

  • Schedule GPU workloads to balance power draw across server nodes, preventing circuit breaker trips and maximizing rack utilization.
  • Implement dynamic voltage and frequency scaling (DVFS) at the GPU level, informed by actual measured power rather than estimated thermal design power (TDP).
  • Optimize cooling system response by correlating PMIC-reported power dissipation with thermal sensor data, reducing fan speeds when actual power consumption is below peak levels.

This closed-loop approach — where telemetry feeds back into workload scheduling — transforms the PMIC from a passive voltage regulator into an active participant in data center efficiency management.


What Are the Key Specifications to Evaluate in a Modern PMIC?

When selecting a PMIC for AI server applications, engineers should evaluate the following critical specifications:

SpecificationWhy It MattersTypical Target
Number of phasesDetermines maximum deliverable current8–16 phases per rail
Peak efficiencyDirectly impacts data center operating costGreater than 95% at full load
Transient responsePrevents voltage droop during load stepsLess than 10 mV undershoot for 500A step
Telemetry ADC resolutionDetermines measurement precision12-bit minimum
PMBus complianceEnsures interoperability with BMC platformsPMBus 1.3+
Operating temperature rangeSupports dense server thermal environments-40°C to +125°C junction
Phase current balancing accuracyPrevents individual phase overloadWithin ±2% across all phases
Integrated protection featuresReduces external component countOVP, UVP, OCP, OTP, SCP

The Competitive Landscape: Who Is Winning the PMIC Battle?

The PMIC market for AI servers is a fierce battleground among a handful of major semiconductor vendors:

  • Monolithic Power Systems (MPS): Rapidly gaining share with highly integrated, easy-to-design-in solutions. The MP29xx series has become a reference design choice for multiple GPU OEMs.
  • Renesas Electronics: Leveraging its acquisition of Intersil to offer best-in-class digital multiphase controllers with advanced telemetry. Their PowerNavigator GUI provides RailScope and PhaseScope tools for real-time per-phase telemetry visualization.
  • Infineon Technologies: Strong in the PMBus ecosystem with broad SupIRBuck and DrMOS portfolios. Infineon’s strength lies in high-current smart power stages with integrated current sensing.
  • Analog Devices (ADI): Focused on the high-voltage side of the power chain with 800V hot swap controllers featuring integrated telemetry. ADI’s µModule regulators combine power conversion and telemetry in a compact system-in-package.
  • Microchip Technology: Targeting the mid-power segment with highly integrated PMICs like the MCP16701, offering eight paralleable buck converters with digital interface support.

The global PMIC market was valued at approximately US$29.25 billion in 2025 and is projected to reach US$69.54 billion by 2035, growing at a CAGR of 10.1%. The AI data center segment is one of the fastest-growing contributors to this expansion.


What Does the Future Hold for PMIC Technology?

Several trends are shaping the next generation of PMICs for AI infrastructure:

  1. Higher phase counts and current density: As GPU power continues to climb, PMICs supporting 20+ phases with integrated GaN or SiC power stages will emerge to deliver currents exceeding 2,000A per rail.
  2. On-chip machine learning for predictive health: Future PMICs may incorporate lightweight ML inference engines that analyze telemetry data locally, predicting failures without relying on external software.
  3. Standardized telemetry APIs: Industry initiatives are pushing for standardized telemetry data formats beyond PMBus, enabling seamless integration with cloud-native monitoring platforms like Prometheus and Grafana.
  4. Co-packaged power and compute: Advanced packaging technologies may place PMIC dies directly adjacent to GPU dies within the same package substrate, minimizing power delivery network impedance and enabling unprecedented transient response.
  5. 800V and beyond: The shift from 48V to 800V DC distribution in data centers will drive demand for PMICs with integrated high-voltage conversion, isolation, and telemetry capabilities in a single device.

Conclusion

The PMIC has undergone a fundamental transformation — from a commodity voltage regulator to a sophisticated, telemetry-rich system component that is essential to the reliability and efficiency of AI infrastructure. Multiphase buck converters deliver the extreme currents that modern GPUs demand. Real-time telemetry via PMBus provides the visibility that data center operators need to manage power at scale. And health monitoring capabilities enable the predictive maintenance strategies that keep billion-dollar AI clusters running.

For power electronics engineers and system architects designing the next generation of AI servers, the PMIC is no longer a component you select last. It is a strategic decision that shapes the performance, reliability, and total cost of ownership of the entire system.

发表回复