Skip to content

Add LoRA RL warm-start from init_adapter_path#2165

Open
philippnormann wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
philippnormann:rl-lora-init
Open

Add LoRA RL warm-start from init_adapter_path#2165
philippnormann wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
philippnormann:rl-lora-init

Conversation

@philippnormann
Copy link
Copy Markdown
Contributor

@philippnormann philippnormann commented Apr 1, 2026

This PR adds a new RL warm-start capability for LoRA training.

With this change, RL can start from an existing saved LoRA adapter instead of always starting from a freshly initialized adapter. This makes it possible to:

  • train an SFT LoRA adapter
  • start RL from that adapter
  • continue training through RL

What this adds

  • a trainer-side init_adapter_path for LoRA warm-start
  • one prepared init-adapter path in PreparedInitAdapter for parsing, validation, normalization, and cached CPU tensors
  • correct loading of the init adapter into slot 0 on fresh setup
  • creation-hook based warm-start for future run slots
  • DTensor-aware application after model materialization
  • end-to-end integration coverage for:
    • SFT LoRA -> RL init from saved adapter -> checkpoint save -> resume

Validation

Added:

  • focused unit tests for:
    • adapter-slot remapping
    • multi-digit adapter index handling
    • DTensor-aware application
    • MoE-style export/load handling
    • creation-hook behavior without mutating the current slot
    • internal-vs-export adapter key boundaries
  • one end-to-end integration test using the reverse-text example, covering:
    • SFT LoRA -> RL init from saved adapter -> checkpoint save -> resume
    • resume from checkpoint step 20
    • resumed reward remaining > 0.65

Integration test convergence plots

loss reward

@philippnormann philippnormann force-pushed the rl-lora-init branch 2 times, most recently from 19e58ff to 3ca377c Compare April 1, 2026 15:55
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Copy Markdown
Member

@Jackmin801 Jackmin801 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I dont think we want to support doing it this way though. I would think having a checkpoint convertor is all you need and this decouples it from the trainer's logic.

So if you have a script that converts an adapter folder containing adapter_config.json and adapter_model.safetensors to a checkpoint that can be resumed by the trainer, the multi_run manager should be able to pick it up and resume from it.

/data/outputs/run_xxx/checkpoints/step_0
├── STABLE
├── trainer
│   ├── rank_0.pt
│   ├── rank_1.pt
│   ├── rank_2.pt
│   ├── rank_3.pt
│   ├── rank_4.pt
│   ├── rank_5.pt
│   ├── rank_6.pt
│   └── rank_7.pt
└── weight
    ├── STABLE
    ├── adapter_config.json
    └── adapter_model.safetensors 

The reason we want it this way is because runs appear and disappear dynamically for our hosted training deployment. And so the design is such that trainer is never the trigger for starting and stopping runs but instead just observing the outputs folder for any outputs/run_* folders. And if there is then it starts the run based on the outputs/run_x/checkpoints and outputs/run_x/control/orch.toml and then the trainer and orchestrator do this queue thing with each other where trainer reads rollouts and produces broadcasts while orchestrator does the inverse.

So in the SFT case, you would use the script to create the checkpoint that the trainer will read from.

In the RL case, orchestrator is in charge of creating the checkpoint and trainer is basically agnostic to where the checkpoint came from, whether it was a checkpoint created by training or a fresh checkpoint created by the conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants