Shaping the future of AI development
Compute, community, and cutting-edge research for AI developers defining what's next
Building the future of AI through research and mentorship

Neural Software: from vision to reality
Join Lambda's co-founder and CEO Stephen Balaban as he unveils Neural Software: an entirely new way of thinking about software that can collaborate with humans, evolve over time, and adapt like never before.
Publications in 2026
PixARMesh
Mesh-native autoregression for whole scenes from a single view, token-by-token.
Fractured Object Recovery
Reassembles what's left. Generates what isn't.
AgentFlow
Train the agent inside its own loop. 7B beats GPT-4o on search, math, and science.
Latent Particle World Models
Discovers objects and masks from raw video, predicts what happens next, no supervision.
Multi-Agent Social Interactions
LLMs fold under peer pressure. Our benchmark and RL recipe help smaller models hold their ground.
EdiVal-Agent
Object-level grading of multi-turn image editing, exposing where today's best editors silently break.
OffTopicEval
Give an LLM a job and clear boundaries. It still answers off-topic questions, almost every time.
Zeroth-Order Federated LLM Fine-Tuning
Sparse updates make syncing cheap enough to go frequent, neutralizing non-IID drift.
LLM Unlearning Reframed as Retrieval
Smarter data selection pushes the forget-vs-retain frontier past oracle sampling.
Principled RL for Diffusion LLMs
Token-level RL doesn't fit diffusion LLMs. Treat the whole sequence as one action, 20–40 point gains.
Align Your Structures for Molecular Dynamics
Pretrains on static molecular structures, stitches them into dynamics trajectories, bypassing simulation data scarcity.
TangoFlux
515M params, 30s of studio audio in under 4 seconds. Aligned by ranking its own outputs.
Video Native Sparse Attention
Learnable sparse attention for video. 3.6% attention budget at 128K tokens, accuracy still improves.
Exponent-Concentrated FP8
Model weight exponents cluster into 2–3 bits of entropy. Lossless FP8 compression, up to 177% faster inference.

The ARChitects secure runner-up in ARC Prize 2025
Last year, “the ARChitects,” a Lambda-sponsored team (including Lambda researcher David Hartmann) won the ARC Prize 2024. This year, they finished second out of 1,400+ teams with a final leaderboard score of 16.53%.

LLM performance benchmarks leaderboard
A clear, data-driven comparison of today's leading large language models. Standardized benchmark results cover top contenders like Meta's Llama 4 series, Alibaba's Qwen3, and the latest from DeepSeek, with critical performance metrics measuring everything from coding ability to general knowledge.

ML Times
Your go-to source for the latest in the field, curated by AI. Sift through the excess. Make every word count.
Best practices and system insights
Diffusion from scratch
Text2Video pretraining
GPU benchmarks
Throughput GPU benchmarks for training deep neural networks.
MLCommon benchmark
Recognized by scholars and industry peers
Scaling laws for diffusion models
Diffusion beats autoregressive in data-constrained settings.
SpatialReasoner
Builds an explicit 3D scene and reasons over it step by step, boosting accuracy and generalization on 3D spatial QA benchmarks.
Tensor decomposition for force-field prediction
Replaces heavy tensor operations in molecular force-field models with low-rank approximation, reducing compute while keeping accuracy.
Bifrost-1
Aligns VLMs with diffusion models through shared CLIP patch embeddings, enabling controllable high-quality generation while preserving reasoning.
BLEUBERI
Using simple BLEU scores as feedback on hard instructions can train instruction-following models that rival those tuned with expensive learned rewards.
OverLayBench
A dataset that stress-tests layout-to-image models on heavily overlapping scenes, exposing current failures and offering an improved baseline.
Breakthroughs backed by Lambda
SAEBench
A comprehensive benchmark for sparse autoencoders in language model interpretability
Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, and Neel Nanda — ICML 2025
VideoHallu
Evaluating and mitigating multi-modal hallucinations on synthetic video understanding
Zongxia Li, Xiyang Wu, Guangyao Shi, Yubin Qin, Hongyang Du, Tianyi Zhou, Dinesh Manocha, and Jordan Lee Boyd-Graber — NeurIPS 2025
VLM2Vec-V2
Advancing multimodal embedding for videos, images, and visual documents
Meng, Rui and Jiang, Ziyan and Liu, Ye and Su, Mingyi and Yang, Xinyi and Fu, Yuepeng and Qin, Can and Chen, Zeyuan and Xu, Ran and Xiong, Caiming, and others — arXiv preprint 2025
Think, prune, train, improve
Scaling reasoning without scaling models
Caia Costello, Simon Guo, Anna Goldie, and Azalia Mirhoseini — ICLR 2025 workshop
NeoBERT
A next-generation BERT
Lola Le Breton, Quentin Fournier, Mariam El Mezouar, and Sarath Chandar — TMLR 2025