How to deploy ML jobs on Lambda Cloud with SkyPilot
TL;DR: SkyPilot is an open-source orchestration tool that automates ML job deployment on Lambda Cloud. This tutorial covers installation, configuration, and running an example job that evaluates DeepSeek-R1-Distill-Qwen-7B on multiplication tasks with automatic instance termination when the job completes.
Without proper orchestration, deploying ML jobs often forces ML engineers to spend valuable time on system administration tasks, such as installing and upgrading software. Poorly managed cloud resources can also lead to unnecessary and costly charges when left running idle.
SkyPilot, an open-source orchestration tool designed to simplify ML job deployment and management on cloud infrastructure (including Lambda Cloud), provides a solution to these common problems.
In this post, you’ll learn how to install SkyPilot, configure it for Lambda, then use it to automatically launch an instance, submit an ML job, and safely terminate the instance after the job’s completion. Our example evaluates the DeepSeek-R1-Distill-Qwen-7B LLM’s ability to solve multiplication problems.
Prerequisites
To try SkyPilot on Lambda Cloud, you’ll need to:
Installing and configuring SkyPilot
Use uv to install SkyPilot on your computer. For other ways to install SkyPilot, see SkyPilot’s installation documentation.
First, create a dedicated directory and virtual environment for SkyPilot:
mkdir -p skypilot && cd skypilot
uv venv --python 3.12
source .venv/bin/activate
uv pip install "skypilot[lambda]"
Once SkyPilot is installed, configure it for Lambda Cloud:
mkdir -p ~/.lambda_cloud
chmod 700 ~/.lambda_cloud
echo "api_key = <LAMBDA-API-KEY>" > ~/.lambda_cloud/lambda_keys
chmod 600 ~/.lambda_cloud/lambda_keys
Replace <LAMBDA-API-KEY> with your actual Cloud API key.
Submitting a job
Define your ML job by creating a YAML file named eval_multiplication.yaml:
resources:
accelerators: {40GB+}
autostop:
idle_minutes: 10
down: true
envs:
SCRIPT_URL: "<https://docs.lambda.ai/assets/code/eval_multiplication.py>"
MODEL_ID: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
setup: |
echo "Downloading evaluation script..."
curl -L "$SCRIPT_URL" -o eval_multiplication.py
run: |
echo "Running evaluation with uv..."
uv run --with vllm --with huggingface-hub python eval_multiplication.py "$MODEL_ID" --stdout
This configuration defines a job that:
- Launches an on-demand instance with 40 GB or more of VRAM
- Automatically terminates the instance if it remains idle for more than 10 minutes
- Downloads and runs a Python script to perform the LLM evaluation
To learn more about defining jobs, see the SkyPilot YAML documentation.
Use the sky launch command to run the job:
sky launch eval_multiplication.yaml
You’ll see a summary of the available resources, similar to the following:
Considered resources (1 node):
-----------------------------------------------------------------------------------------
INFRA INSTANCE vCPUs Mem(GB) GPUS COST ($) CHOSEN
-----------------------------------------------------------------------------------------
Lambda (us-east-1) gpu_1x_a6000 14 100 A6000:1 0.80 ✔
Lambda (us-east-1) gpu_1x_a100_sxm4 30 200 A100:1 1.29
Lambda (us-east-1) gpu_1x_gh200 64 432 GH200:1 1.49
Lambda (us-east-1) gpu_1x_h100_pcie 26 200 H100:1 2.49
Lambda (us-east-1) gpu_1x_b200_sxm6 26 360 B200:1 5.29
-----------------------------------------------------------------------------------------
Launching a new cluster 'sky-ea49-lambda'. Proceed? [Y/n]:
Launch a new cluster (instance) to begin the job. Once it’s complete, you’ll see the evaluation results:
Processed prompts: 100%|██████████| 1000/1000 [00:43<00:00, 23.01it/s, est. speed input: 388.32 toks/s, output: 6669.35 toks/s] 6662.85 toks/s]
(task, pid=3997) Model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B accuracy: 0.9020
✓ Job finished (status: SUCCEEDED).
After 10 minutes, the instance will automatically terminate. You can use the Lambda Cloud console to confirm that the instance has terminated.
Next steps
SkyPilot simplifies deploying and managing ML workloads on Lambda Cloud, so you can spend less time on infrastructure and more time on model development and evaluation.
Not sure which orchestration solution is best for your organization?