The Lambda Deep Learning Blog

Published on December 15, 2025 by Chuan Li

From bigger models to better intelligence: what NeurIPS 2025 tells us about progress

NeurIPS has always been a mirror: it doesn’t just reflect what the community is building, it reveals what the community is starting to believe. In 2025, that ...

Published on December 20, 2023 by Chuan Li

Benchmarking ZeRO-Inference on the NVIDIA GH200 Grace Hopper Superchip

This blog explores the synergy of DeepSpeed’s ZeRO-Inference, a technology designed to make large AI model inference more accessible and cost-effective, with ...

Published on November 21, 2023 by Chuan Li

Unleashing the power of Transformers with NVIDIA Transformer Engine

In this blog, Lambda showcases the capabilities of NVIDIA’s Transformer Engine, a cutting-edge library that accelerates the performance of transformer models ...

DeepChat 3-Step Training At Scale: NVIDIA H100 SXM5 vs A100

Published on October 12, 2023 by Chuan Li

DeepChat 3-Step Training At Scale: Lambda’s Instances of NVIDIA H100 SXM5 vs A100 SXM4

GPU benchmarks on Lambda’s offering of the NVIDIA H100 SXM5 vs the NVIDIA A100 SXM4 using DeepChat’s 3-step training example.

NVIDIA H100 vs A100 Benchmarks for FlashAttention-2 on Lambda Cloud

Published on August 24, 2023 by Chuan Li

How FlashAttention-2 Accelerates LLMs on NVIDIA H100 and A100 GPUs

This blog post walks you through how to use FlashAttention-2 on Lambda Cloud and outlines NVIDIA H100 vs NVIDIA A100 benchmark results for training GPT-3-style ...

Mpirun-Launch-LLAMA-Inference-Blog-einstein-newton

Published on March 14, 2023 by Chuan Li

How To Use mpirun to Launch a LLaMA Inference Job Across Multiple Cloud Instances

Published on December 1, 2022 by Chuan Li

Hugging Face x Lambda: Whisper Fine-Tuning Event

Lambda is thrilled to team up with Hugging Face, a community platform that enables users to build, train, and deploy ML models based on open source code, for a ...

Published on October 31, 2022 by Chuan Li

NVIDIA GeForce RTX 4090 vs RTX 3090 Deep Learning Benchmark

Available October 2022, the NVIDIA® GeForce RTX 4090 is the newest GPU for gamers, creators, students, and researchers. In this post, we benchmark RTX 4090 to ...

Published on October 5, 2022 by Chuan Li

NVIDIA H100 Tensor Core GPU - Deep Learning Performance Analysis

We have seen groundbreaking progress in machine learning over the last couple of years. At the same time, massive usage of GPU infrastructure has become key to ...

Published on August 26, 2022 by Chuan Li

Multi node PyTorch Distributed Training Guide For People In A Hurry

Published on June 3, 2022 by Chuan Li

Setting Up A Kubernetes Run:AI Cluster on Lambda Cloud

If you're interested in training the next large transformer like DALL-E, Imagen, or BERT, a single GPU (or even single 8x GPU instance!) might not be enough ...

Best GPUs for Deep Learning in 2022 (so far)

Published on February 28, 2022 by Chuan Li

Best GPU for Deep Learning in 2022 (so far)

TLDR

Published on November 30, 2021 by Chuan Li

NVIDIA A40 Deep Learning Benchmarks

NVIDIA® A40 GPUs are now available on Lambda Scalar servers. In this post, we benchmark the A40 with 48 GB of GDDR6 VRAM to assess its training performance ...

Published on September 22, 2021 by Chuan Li

Tesla A100 Server Total Cost of Ownership Analysis

This post compares the Total Cost of Ownership (TCO) for Lambda servers and clusters vs cloud instances with NVIDIA A100 GPUs. We first calculate the TCO for ...

Published on August 9, 2021 by Chuan Li

RTX A6000 vs RTX 3090 Deep Learning Benchmarks

Check out the discussion on Reddit 160 upvotes, 41 comments

The Lambda
Deep Learning Blog

Recent