Accelerate Your AI Workflow with FP4 Quantization on Lambda
As AI models grow in complexity and size, the demand for efficient computation becomes paramount. FP4 (4-bit Floating Point) precision emerges as a ...
Published on by Anket Sah
As AI models grow in complexity and size, the demand for efficient computation becomes paramount. FP4 (4-bit Floating Point) precision emerges as a ...
Published on by Anket Sah
DeepSeek has just leveled up. The latest release, DeepSeek-R1-0528, is now available on Lambda’s Inference API, delivering a formidable blend of mathematical ...
Published on by Thomas Bordes
When it comes to large language model (LLM) inference, cost and performance go hand-in-hand. Single GPU instances are practical and economical; however, models ...
Published on by Chuan Li
This blog explores the synergy of DeepSpeed’s ZeRO-Inference, a technology designed to make large AI model inference more accessible and cost-effective, with ...
Published on by Chuan Li
In this blog, Lambda showcases the capabilities of NVIDIA’s Transformer Engine, a cutting-edge library that accelerates the performance of transformer models ...
Published on by Chuan Li
GPU benchmarks on Lambda’s offering of the NVIDIA H100 SXM5 vs the NVIDIA A100 SXM4 using DeepChat’s 3-step training example.
Published on by Chuan Li
This blog post walks you through how to use FlashAttention-2 on Lambda Cloud and outlines NVIDIA H100 vs NVIDIA A100 benchmark results for training GPT-3-style ...
Published on by Chuan Li
Available October 2022, the NVIDIA® GeForce RTX 4090 is the newest GPU for gamers, creators, students, and researchers. In this post, we benchmark RTX 4090 to ...
Published on by Eole Cervenka
UPDATE 2022-Oct-13 (Turning off autocast for FP16 speeding inference up by 25%) What do I need for running the state-of-the-art text to image model? Can a ...
Published on by Chuan Li
We have seen groundbreaking progress in machine learning over the last couple of years. At the same time, massive usage of GPU infrastructure has become key to ...
Published on by Chuan Li
NVIDIA® A40 GPUs are now available on Lambda Scalar servers. In this post, we benchmark the A40 with 48 GB of GDDR6 VRAM to assess its training performance ...
Published on by Chuan Li
This post compares the Total Cost of Ownership (TCO) for Lambda servers and clusters vs cloud instances with NVIDIA A100 GPUs. We first calculate the TCO for ...
Published on by Chuan Li
Check out the discussion on Reddit 160 upvotes, 41 comments
Published on by Michael Balaban
Check out the discussion on Reddit 195 upvotes, 23 comments
Published on by Michael Balaban
Lambda is now shipping RTX A6000 workstations & servers. In this post, we benchmark the RTX A6000's PyTorch and TensorFlow training performance. We compare ...