Shaping the future of AI development

Compute, community, and cutting-edge research for AI developers defining what's next

Pushing the frontier of AI/ML research

neural-os-preview

Neural Software: from vision to reality

Join Lambda's co-founder and CEO Stephen Balaban as he unveils Neural Software: an entirely new way of thinking about software that can collaborate with humans, evolve over time, and adapt like never before.

Our work at NeurIPS 2025

01
Scaling laws for diffusion models

Diffusion beats autoregressive in data-constrained settings.

Read more
02
SpatialReasoner
Builds an explicit 3D scene and reasons over it step by step, boosting accuracy and generalization on 3D spatial QA benchmarks.
Read more
03
Tensor decomposition for force-field prediction
Replaces heavy tensor operations in molecular force-field models with low-rank approximation, reducing compute while keeping accuracy.
Read more
04
Bifrost-1
Aligns VLMs with diffusion models through shared CLIP patch embeddings, enabling controllable high-quality generation while preserving reasoning.
Read more
05
BLEUBERI
Using simple BLEU scores as feedback on hard instructions can train instruction-following models that rival those tuned with expensive learned rewards.
Read more
06
OverLayBench
A dataset that stress-tests layout-to-image models on heavily overlapping scenes, exposing current failures and offering an improved baseline.
Read more
benchmarks_leaderboard

LLM performance benchmarks leaderboard

A clear, data-driven comparison of today's leading large language models. Standardized benchmark results cover top contenders like Meta's Llama 4 series, Alibaba's Qwen3, and the latest from DeepSeek, with critical performance metrics measuring everything from coding ability to general knowledge.

ml_times

ML Times

Your go-to source for the latest in the field, curated by AI. Sift through the excess. Make every word count.

Best practices and system insights

01
Diffusion from scratch
A guide for diffusion models implemented in a single PyTorch script.
Learn more
02
Text2Video pretraining
Lessons learned from training a text-to-video model with hundreds of GPUs.
Learn more
03
GPU benchmarks

Throughput GPU benchmarks for training deep neural networks.

Learn more
04
MLCommon benchmark
Time-to-solution benchmark for training foundation models on clusters.
Learn more

Recognized by scholars and industry peers

Latent thought models

Teach LLMs to “think” before they speak and improve parameter and memory efficiency.

Product of experts with LLMs

Boosting performance on the ARC-AGI challenge is a matter of perspective.

 

* Winner, ARC-AGI 2024

DepR

Uses depth cues and diffusion models to turn a single image into a clean, instance-level 3D scene.

Video MMLU

A benchmark that tests whether models can truly follow and understand long, multi-subject lecture videos, revealing that current VLM’s limitations.

Latent adaptive planner

Robots can learn a compact “plan space” from human demos and continually updates its plan as the scene changes, enabling faster and more reliable manipulation.

AimBot

AimBot draws visual cue onto robot camera images so policies directly see where the gripper is in 3D, boosting manipulation success.

Word salad chopper

Detects when a reasoning model has drifted into meaningless repetition and cleanly cuts away those extra tokens, saving a lot of output cost with almost no quality loss.

DEL-ToM for theory-of-mind reasoning

Breaks social reasoning into “who-knows-what” steps and uses a checker to pick the most consistent answer, improving small-models’ theory-of-mind performance.

VeriFastScore

Trains a single model to extract and verify all claims in a long answer at once using web evidence, speeding up long-form factuality evaluation.

Error typing for smarter rewards

Labeling each reasoning step’s mistake type and converts those labels into scores, giving richer feedback and improving math problem solving with less data.

Breakthroughs backed by Lambda

Bold ideas, funded and refined through the Lambda Research Grant. These are the projects shaping how AI learns, reasons, and scales — built by the researchers defining what’s next.
01
SAEBench

A comprehensive benchmark for sparse autoencoders in language model interpretability

 

Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, Demian Till,  Matthew Wearden, Arthur Conmy, Samuel Marks, and Neel Nanda — ICML 2025

Read more
02
VideoHallu

Evaluating and mitigating multi-modal hallucinations on synthetic video understanding

 

Zongxia Li, Xiyang Wu, Guangyao Shi, Yubin Qin, Hongyang Du, Tianyi Zhou, Dinesh Manocha, and Jordan Lee Boyd-Graber — NeurIPS 2025

Read more
03
VLM2Vec-V2

Advancing multimodal embedding for videos, images, and visual documents

 

Meng, Rui and Jiang, Ziyan and Liu, Ye and Su, Mingyi and Yang, Xinyi and Fu, Yuepeng and Qin, Can and Chen, Zeyuan and Xu, Ran and Xiong, Caiming, and others — arXiv preprint 2025

Read more
04
Think, prune, train, improve

Scaling reasoning without scaling models

 

Caia Costello, Simon Guo, Anna Goldie, and Azalia Mirhoseini — ICLR 2025 workshop

Read more
05
NeoBERT

A next-generation BERT

 

Lola Le Breton, Quentin Fournier, Mariam El Mezouar, and Sarath Chandar — TMLR 2025

Read more

Join us in shaping the future of AI

We're committed to supporting groundbreaking research by offering qualifying researchers up to $5,000 in cloud credits to develop and showcase their work using Lambda's Instances, with select research to be featured on our website.