Frontier AI training and inference now operate at unprecedented scale. Training clusters have moved from thousands and tens of thousands of NVIDIA GPUs just a few years ago to 100,000+ and soon millions of contiguous, interconnected GPUs as model sizes and training data sets have continued to scale. Even inference, broadly considered a single node or a single GPU workload, has shown much higher Token/GPU throughput when served in a distributed, disaggregated manner across hundreds of GPUs. Test-time scaling via reasoning models, agentic workloads, and post-training with inference-heavy reinforcement learning have continued to ramp up the volume of cluster-based inference.
For Lambda, building GPU clusters at this scale now means designing the GPU interconnect network as a core part of the compute architecture rather than a supporting layer. Redesign of the compute interconnect network, also known as the compute fabric, East-West network, back-end network, or scale-out network, has accelerated to match the annual pace of GPU refresh cycles. Leading-edge AI clusters require denser, more efficient networks capable of connecting hundreds of thousands of GPUs while improving energy efficiency, reliability, performance, and scalability.
Traditional network switches with pluggable optical transceivers rely on long, high-speed electrical traces. The data signal travels from the switch application-specific integrated circuit (ASIC) across the printed circuit board (PCB), through connectors, and into the separate transceiver module, where it is converted into an optical signal. Inside the transceiver, the electro-optical conversion requires a digital signal processor (DSP) or retimer for signal correction, along with a laser subsystem. Each transition, from switch ASIC to trace, connectors, module electronics, and then to fiber-optic cable, introduces signal loss. The additional active components required to compensate for these transitions and signal loss increase power consumption.
Co-packaged optics with silicon photonics simplify the data path by placing optical components such as laser transmitters, modulators, and photodetectors directly on or next to the same package as the switch ASIC. In practice, this means much shorter trace lengths, fewer connections, and removal of multiple active components, enabling better signal loss, lower latency, and reduced power consumption compared with conventional pluggable optics.
NVIDIA Quantum-X Photonics InfiniBand and NVIDIA Spectrum-X Photonics Ethernet switches use co-packaged optics (CPO) with integrated silicon photonics to provide the most advanced networking solution for massive-scale AI infrastructure. CPO addresses the demands and constraints of GPU clusters across multiple vectors:
Lambda is preparing its next-generation GPU clusters to integrate CPO networking using NVIDIA Quantum-X Photonics InfiniBand and Spectrum-X Photonics Ethernet switches. These advances in silicon-photonics switching are critical as we design massive-scale training and inference systems. For Lambda’s NVIDIA GB300 NVL72 and NVIDIA Vera Rubin NVL144 clusters, we are adopting CPO-based networks to deliver higher reliability and performance for customers while simplifying large-scale deployment operations and improving power efficiency.
Two-layer 1,152 GPU NVIDIA GB300 NVL72 Cluster with Quantum-X800 Photonics switches
By using CPO-based networking as a core element of our compute architecture, we can enable key infrastructure advantages for Lambda and our customers, including:
As AI workloads continue to demand ever-greater scale and throughput, CPO networking will be a foundational enabler of Lambda’s mission to deliver high-performance, scalable, and efficient GPU compute infrastructure.
From frontier model training to distributed inference, Lambda's CPO-enabled GPU clusters are engineered to handle the most demanding AI workloads. Whether you're planning your next large training run or designing for distributed inference, we can help you architect the right solution.