NVIDIA B200s are live on Lambda Cloud! Set up your Demo today! 

Partner Spotlight: Orchestrating large-scale agent training on Lambda with dstack and RAGEN

Lambda + dstack: Empowering your ML team with rock-solid infrastructure for distributed reasoning agent training

Reasoning agents are among the most sought-after LLM use cases, automating complex tasks across domains. The most powerful models remain proprietary, limiting access and research. Training these agents requires serious resources. With Lambda’s 1-Click Clusters and dstack’s orchestration, your team spends less time on setup and more on building. In this blog, we walk you through scaling multi-turn reinforcement learning training with RAGEN on Lambda’s 1CC using dstack, making open-source reasoning agents accessible at scale.

While vanilla GRPO/PPO excels at single–turn tasks, it’s observed that it struggles with multi–turn agent fine-tuning. Even after prefixing dialogues into longer inputs, yields wildly oscillating metrics (reward, reward_std, loss, KL) and fails to converge on multi-step interactions ​ as shown below.

Reinforcement Learning with RAGEN

RAGEN is a reinforcement learning framework for training large language models as reasoning agents in complex, multi-turn environments. Designed to improve agent reliability and performance, RAGEN tackles challenges like overfitting to local rewards and misaligned reasoning. It introduces fine-grained reward signals that help guide agents toward more consistent, goal-driven behavior.

RAGEN natively treats each state as the full tokenized history and optimizes cumulative rewards over K-turn trajectories, and its StarPO-S variant further stabilizes learning through trajectory-level filtering, KL removal, and asymmetric clipping.

RAGEN’s variance-based prompt filtering keeps training focused on informative, uncertain cases so the agent continues to explore rather than collapse into rote behavior.

Under the hood, RAGEN uses verl, flexible, efficient and production-ready RL training library, and Ray, a robust ML library for distributed training. Compared to SFT, RL requires significantly more resources. This is why being able to run training across a cluster is crucial.

While Ray is a powerful framework for distributed processing, it’s rather low-level. In this blog, we explore how Ray can be used via dstack.

dstack and Lambda 1CC

dstack is an open-source container orchestration built specifically for ML teams. It simplifies GPU provisioning and workload orchestration, providing a streamlined alternative to Kubernetes and Slurm.

RAGEN includes built-in simulators like Sokoban, WebShop, and FrozenLake for on-policy rollouts and multi-turn evaluation, along with a modular API for custom text environments. In our example, we train an agent with Qwen/Qwen2.5-7B-Instruct with a SimpleSokoban environment.

For training our agent, we use Lambda’s 1CC with 16x NVIDIA HGX H100 GPUs.

While dstack supports automatic GPU provisioning in Lambda’s Public Cloud, using Lambda’s 1CC or Private Cloud requires accessing these clusters through dstack’s SSH fleets.

Now, we’ll walk you through the steps of using RAGEN for fine-tuning a reasoning agent on a Lambda’s 1CC with dstack.

dstack SSH fleets

Creating an SSH fleet is straightforward. Inside your repo, define the following lambda-h100-fleet.dstack.yml file:

type: fleet
name: lambda-h100-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - lambda-cluster-node-001
    - lambda-cluster-node-002
  proxy_jump:
    hostname: 192.222.48.90
    user: ubuntu
    identity_file: ~/.ssh/id_rsa

placement: cluster

Under hosts, we list the hostnames of the GPU nodes. Under proxy_jump, we specify the hostname of the head node along with the private SSH key.

Now, go ahead and apply the fleet configuration:

$ dstack apply -f lambda-h100-fleet.dstack.yml

After you run the dstack apply command, you’ll see that dstack created the corresponding fleet that now can be used for running dev environments, tasks, and services.

NCCL tests

In order to test the cluster and its interconnect speed, you can run nccl-tests via a dstack task:

$ dstack apply -f nccl-tests.dstack.yml

See the NCCL tests example for the source code of nccl-tests.dstack.yml

Ray with dstack

dstack tasks allow you to run any distributed workload directly using torch run, accelerate, or other distributed frameworks. However, to use Ray with dstack, you need to launch a Ray cluster as a task before submitting Ray jobs:

type: task
name: ray-ragen-cluster

nodes: 2

env:
- WANDB_API_KEY
image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.2
commands:
  - wget -O miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  - bash miniconda.sh -b -p /workflow/miniconda
  - eval "$(/workflow/miniconda/bin/conda shell.bash hook)"
  - git clone https://github.com/RAGEN-AI/RAGEN.git
  - cd RAGEN
  - bash scripts/setup_ragen.sh
  - conda activate ragen
  - cd verl
  - pip install --no-deps -e .
  - pip install hf_transfer hf_xet
  - pip uninstall -y ray
  - pip install -U "ray[default]"
  - |
    if [ $DSTACK_NODE_RANK = 0 ]; then
        ray start --head --port=6379;
    else
        ray start --address=$DSTACK_MASTER_NODE_IP:6379
    fi

# Expose Ray dashboard port
ports:
  - 8265

resources:
  gpu: 80GB:8
  shm_size: 128GB

# Save checkpoints on the instance
volumes:
  - /checkpoints:/checkpoints

This dstack task launches a Ray cluster with the RAGEN environment. Run this task using the same dstack apply command:

$ dstack apply -f ray-ragen-cluster.dstack.yml

When the task exposes ports, the dstack apply command automatically forwards these ports to the current machine. In our case, this makes the Ray's dashboard available locally at localhost:8265.

Training run

This means, we can submit Ray tasks from our local machine.

$ pip install ray
$ RAY_ADDRESS=http://localhost:8265
$ ray job submit \
  -- bash -c "\
    export PYTHONPATH=/workflow/RAGEN; \
    cd /workflow/RAGEN; \
    /workflow/miniconda/envs/ragen/bin/python train.py \
      --config-name base \
      system.CUDA_VISIBLE_DEVICES=[0,1,2,3,4,5,6,7] \
      model_path=Qwen/Qwen2.5-7B-Instruct \
      trainer.experiment_name=agent-fine-tuning-Qwen2.5-7B \
      trainer.n_gpus_per_node=8 \
      trainer.nnodes=2 \
      micro_batch_size_per_gpu=2 \
      trainer.default_local_dir=/checkpoints \
      trainer.save_freq=50 \
      actor_rollout_ref.rollout.tp_size_check=False \
      actor_rollout_ref.rollout.tensor_model_parallel_size=4"

GPU metrics

While the training is done, you can observe hardware metrics via dstack’s control plane:

Once the Ray cluster is not needed anymore, you can stop it via dstack stop:

$ dstack stop ray-ragen-cluster

Training recovery

Because our Ray cluster task mounts the /checkpoints folder inside the containers to the cluster instance as an instance volume, we can recover training in case of a failure or cluster restart.

After walking through how to launch a dstack SSH fleet on Lambda’s 1CC instances, provision a multi-node Ray cluster, and run distributed training with RAGEN, you can now extend the same workflow to other frameworks. If your framework doesn’t rely on Ray, you can simply skip the Ray cluster step and run distributed training directly.

To dive deeper, check out the Clusters guide. dstack provides a streamlined alternative to Kubernetes and Slurm for teams running on Lambda, with native support for training workloads, development environments, and persistent services.

What’s next?

With Lambda and dstack handling infrastructure complexity, your team can push the boundaries of reasoning agents without setup headaches. Scale your experiments confidently on Lambda’s powerful clusters with dstack.

  • Sign up on Lambda to access GPU clusters tailored to your needs.

  • Explore the RAGEN and verl framework repos.

  • Check out dstack's repository.