Managed and Unmanaged Slurm
Slurm job management optimized for AI workloads is available on Lambda's 1-Click Clusters.
Slurm Job Management for AI Clusters
Our FREE for a limited time Slurm workload scheduler offering includes both unmanaged and managed solutions for H100 Clusters (B200 coming soon) as early preview. Choose unmanaged for full control, or managed to let Lambda handle the administration.
Managed Slurm: Hands-Off Efficiency
Let us handle the complexities of Slurm administration. Managed Slurm provides all the features of Unmanaged, plus comprehensive support and management by Lambda:
-
Slurm patches
-
Job history tracking
-
Technical support — Lambda partners with SchedMD for backend support
-
Node failure detection and replacement
-
Cluster and Slurm daemon health monitoring, including slurmctl, slurmdbd, and node Slurm
Unmanaged Slurm: Complete control
Take the reins with Unmanaged Slurm. You get Lambda's optimized Slurm configuration with built-in features for advanced cluster management, including:
-
Built-in LDAP auth for user/group management
-
Policies based on cgroups
-
Container support (Pyxis, Enroot)
-
Slurm user, operator, and administrator access
-
High Availability (HA)
Deploy seamlessly on Lambda's 1-Click Clusters
Both Unmanaged and Managed Slurm run on Lambda's 1-Click Clusters with NVIDIA H100 and NVIDIA HGX B200 GPUs, providing scalable GPU resources for your AI workloads. 1-Click Clusters logically partitioned Infiniband-connected GPU clusters contracted for 1 to 52 weeks, located in data centers with 8x5 continuous presence and 24x7 on-call availability.
Ready to build?
Contact us to learn more about our Slurm job management solution and how it can help you accelerate your AI initiatives.