Add LoRA RL warm-start from `init_adapter_path` by philippnormann · Pull Request #2165 · PrimeIntellect-ai/prime-rl

philippnormann · 2026-04-01T01:58:44Z

This PR adds a new RL warm-start capability for LoRA training.

With this change, RL can start from an existing saved LoRA adapter instead of always starting from a freshly initialized adapter. This makes it possible to:

train an SFT LoRA adapter
start RL from that adapter
continue training through RL

What this adds

a trainer-side init_adapter_path for LoRA warm-start
one prepared init-adapter path in PreparedInitAdapter for parsing, validation, normalization, and cached CPU tensors
correct loading of the init adapter into slot 0 on fresh setup
creation-hook based warm-start for future run slots
DTensor-aware application after model materialization
end-to-end integration coverage for:
- SFT LoRA -> RL init from saved adapter -> checkpoint save -> resume

Validation

Added:

focused unit tests for:
- adapter-slot remapping
- multi-digit adapter index handling
- DTensor-aware application
- MoE-style export/load handling
- creation-hook behavior without mutating the current slot
- internal-vs-export adapter key boundaries
one end-to-end integration test using the reverse-text example, covering:
- SFT LoRA -> RL init from saved adapter -> checkpoint save -> resume
- resume from checkpoint step 20
- resumed reward remaining > 0.65

Integration test convergence plots

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

src/prime_rl/configs/trainer.py

tests/integration/test_rl_lora_init_continuation.py

Jackmin801

Thanks for the contribution! I dont think we want to support doing it this way though. I would think having a checkpoint convertor is all you need and this decouples it from the trainer's logic.

So if you have a script that converts an adapter folder containing adapter_config.json and adapter_model.safetensors to a checkpoint that can be resumed by the trainer, the multi_run manager should be able to pick it up and resume from it.

/data/outputs/run_xxx/checkpoints/step_0
├── STABLE
├── trainer
│   ├── rank_0.pt
│   ├── rank_1.pt
│   ├── rank_2.pt
│   ├── rank_3.pt
│   ├── rank_4.pt
│   ├── rank_5.pt
│   ├── rank_6.pt
│   └── rank_7.pt
└── weight
    ├── STABLE
    ├── adapter_config.json
    └── adapter_model.safetensors

The reason we want it this way is because runs appear and disappear dynamically for our hosted training deployment. And so the design is such that trainer is never the trigger for starting and stopping runs but instead just observing the outputs folder for any outputs/run_* folders. And if there is then it starts the run based on the outputs/run_x/checkpoints and outputs/run_x/control/orch.toml and then the trainer and orchestrator do this queue thing with each other where trainer reads rollouts and produces broadcasts while orchestrator does the inverse.

So in the SFT case, you would use the script to create the checkpoint that the trainer will read from.

In the RL case, orchestrator is in charge of creating the checkpoint and trainer is basically agnostic to where the checkpoint came from, whether it was a checkpoint created by training or a fresh checkpoint created by the conversion.

philippnormann force-pushed the rl-lora-init branch 2 times, most recently from 19e58ff to 3ca377c Compare April 1, 2026 15:55

cursor bot reviewed Apr 1, 2026

View reviewed changes

src/prime_rl/configs/trainer.py Show resolved Hide resolved

tests/integration/test_rl_lora_init_continuation.py Show resolved Hide resolved

philippnormann and others added 8 commits April 3, 2026 00:19

Add LoRA RL warm-start from init_adapter_path

8ac225f

Add changelog entry for init_adapter_path

ad2dcd1

Trim LoRA warm-start diagnostics

458fe47

Rename RL init integration config paths

3047fdc

Tune integration test configs for speed and reliability

2e828fc

Fix bugbot review comments

c9fb031

Clean up LoRA adapter helpers

c07a07b

Simplify RL init adapter setup flow

efb3684

philippnormann force-pushed the rl-lora-init branch from ad5769f to efb3684 Compare April 3, 2026 00:13

Jackmin801 reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA RL warm-start from `init_adapter_path`#2165

Add LoRA RL warm-start from `init_adapter_path`#2165
philippnormann wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
philippnormann:rl-lora-init

philippnormann commented Apr 1, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jackmin801 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

philippnormann commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds

Validation

Integration test convergence plots

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jackmin801 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philippnormann commented Apr 1, 2026 •

edited

Loading