NVIDIA B200s are live on Lambda Cloud! Set up your Demo today! 

Lambda Managed Slurm: AI Cluster Management, Your Way

Introducing Managed Slurm (Early Preview) on Lambda: Your AI Cluster’s New Best Friend

Think of Slurm as the air‑traffic controller for your GPU fleet that helps with scheduling jobs, juggling resources and keeping everything running smoothly so you can focus on what really matters (model development). Managed Slurm (FREE for a limited time) on Lambda is our fully supported Slurm offering, purpose-built for fast and seamless deployment on One-Click Clusters. If DIY is more your style, Unmanaged has your back. So, whether you’re the type who loves full control or prefer to hand over the reins, we’ve got you covered with both Managed and Unmanaged flavors.

What This Offering Does

  • Optimizes cluster utilization for AI/ML workloads, squeezing out every drop of compute power.
  • Pre‑validated on Lambda One‑Click Clusters for seamless “click‑and‑go” deployments, no lengthy setup docs or midnight configuration sessions.
  • Available exclusively on Lambda’s 1 Click Cluster: fully integrated, pre‑validated and ready to launch in minutes.

Our Availability & Feature List

Core Slurm Capabilities (Both Editions)

  • Latest Lambda‑tuned Slurm config for AI workloads
  • LDAP‑backed user/group management
  • cgroups‑based resource policies
  • Container support (Pyxis, Enroot, Podman, Apptainer)
  • Slurm roles: User, Operator, Admin
  • High Availability (HA) for master daemons
  • Pre‑installed ML software modules: Open MPI, CUDA, PyTorch, UCX/HWLOC, PMIx, and more

Managed‑Only Extras

Lambda takes on Slurm administration so you don’t have to:

  • Automate Slurm patches & security updates
  • Job history tracking & best‑effort preservation
  • SchedMD partnership for escalated issue resolution
  • Proactive health monitoring of slurmctld, slurmdbd & nodes
  • Node‑failure detection & hardware replacement
  • Alerting & root‑cause analysis for Slurm services

Unmanaged vs. Managed: Which Flavor Fits You?

Feature

Managed Slurm

Unmanaged Slurm

Admin Responsibility

Lambda is your Slurm administrator

You wear the Slurm admin hat

Support Level

Full HPC Support SLAs + SchedMD backup

General infra support only

SLA Response

SEV 1 within 2 hours

SEV 2 within 1 Business Day 

SEV 3 within 3 Business Days 

Best‑effort (no guarantees)

Job History

Preserved on best‑effort reinstall

Not preserved on reinstall

Security & Patches

Lambda‑managed

Customer‑managed

Reinstall Turnaround

Target: 1 business day

Best effort

Custom Software Installs

Lambda handles installs & updates

You install/manage extra packages

For a deeper dive into features and customization options, check out our full documentation.

Bottom line: Go Unmanaged if you’re a Slurm power user who loves full control (and don’t mind the admin hat). Choose Managed if you’d rather laser‑focus on training and research, and let our HPC team handle the scheduling, patching, and heroic rescues when things go sideways.

Did We Mention It's Free?

During our preview period, both Managed and Unmanaged Lambda Slurm are available at no additional cost for a limited time. Yes, $0/GPU‑hr for the scheduler itself, but that won’t last forever. 
Whether you're exploring cluster management for the first time or ready to test drive a hands-free HPC setup, there's never been a better time to launch.

Ready to Launch?

Getting started with Slurm on Lambda is simple. Launch a One‑Click Cluster from your dashboard, reach out to us and our team will help set up the right flavor: Managed or Unmanaged, based on your needs. 

No secret handshakes, no hidden fees. Just powerful GPU‑harnessing Slurm capabilities at the click of a button.