MLPerf Inference v5.0: Lambda’s Clusters Prove Ready for Today and Tomorrow’s AI Inference Demands

Ensuring our customers’ success is a core value at Lambda, and MLPerf Inference v5.0 is a part of our commitment to providing the best compute platform for AI innovation. Today, we’re thrilled to share our first public-facing results on NVIDIA HGX B200 and NVIDIA HGX H200 platforms, showcasing how our innovative cloud infrastructure is setting new standards for AI inference performance.

Raising the Bar with NVIDIA Collaboration

Our long-standing collaboration with NVIDIA continues to power our success. By harnessing the great performance of the NVIDIA HGX B200 systems and NVIDIA Hopper GPUs, we’re delivering a platform where world-class support, optimized software and scalable AI infrastructure converge to make state-of-the-art AI computing accessible to everyone. Our recent tests reaffirm that when top-tier accelerators meet Lambda’s innovative cloud solutions, the results speak for themselves.

MLPerf Inference v5.0: Our Testbed

MLPerf Inference v5.0 is an industry-standard benchmark that pushes AI systems to their limits. For this round, our Lambda Cloud 1-Click Clusters™ were put through their paces using a variety of models—from DLRMv2 and GPT-J to Llama 2 70B and Mixtral 8x7B. The outcome? State-of-the-art results and significant throughput improvements that prove our clusters are primed and ready for even the most demanding inference workloads. 

Our internal tests complemented the official results, revealing:

  • Scalability: Our 8-GPU node sustained high performance, completing 99.9% of samples faster than previous rounds.
  • Throughput: Our optimized configurations on NVIDIA H200 delivered up to 13% higher tokens-per-second compared to MLPerf’s previous best NVIDIA H200 runs. Similarly, our NVIDIA HGX B200 results achieved a 21% improvement over its own previous best runs. When compared against previous best NVIDIA H100 results, our NVIDIA H200 runs provided up to 50% higher throughput, and our NVIDIA HGX B200 runs delivered up to 300% higher throughput. 
  • Reliability: Our infrastructure maintained 99.99% uptime during extensive stress tests, ensuring seamless AI deployments across workloads. 

For detailed information on our benchmark runs check out the MLCommons GitHub page.

Our Findings & Forward-Thinking Vision

Beyond the numbers, our benchmark journey has provided invaluable insights. We’ve discovered new optimizations that not only boost inference performance but also enhance user experience by reducing latency and improving overall system efficiency. This is more than just a benchmark — it’s a testament to our commitment to empowering AI research and production environments with technology that scales as fast as our users’ ideas.

Our collaborative efforts with NVIDIA continue to drive innovation, ensuring that our customers have the best possible tools to accelerate their AI development lifecycle. Whether you’re training massive language models or deploying real-time inference applications, Lambda’s cloud solutions are engineered to support your ambitions.

Looking Ahead

Every benchmark pushes us closer to a world where AI computation is as seamless as flipping a switch—no bottlenecks, no headaches, just raw performance on demand. We’re refining, optimizing, and scaling, because the future of AI doesn’t wait. 

Excited to build your own? Our 1-Click Clusters are available on-demand, accelerated by up to 512 NVIDIA Blackwell GPUs.

Prefer a serverless route? Lambda’s inference API gives you access to the best open-source models—no rate limits, no nonsense.