Add LoRA RL warm-start from init_adapter_path#2165
Add LoRA RL warm-start from init_adapter_path#2165philippnormann wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
init_adapter_path#2165Conversation
19e58ff to
3ca377c
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
ad5769f to
efb3684
Compare
Jackmin801
left a comment
There was a problem hiding this comment.
Thanks for the contribution! I dont think we want to support doing it this way though. I would think having a checkpoint convertor is all you need and this decouples it from the trainer's logic.
So if you have a script that converts an adapter folder containing adapter_config.json and adapter_model.safetensors to a checkpoint that can be resumed by the trainer, the multi_run manager should be able to pick it up and resume from it.
/data/outputs/run_xxx/checkpoints/step_0
├── STABLE
├── trainer
│ ├── rank_0.pt
│ ├── rank_1.pt
│ ├── rank_2.pt
│ ├── rank_3.pt
│ ├── rank_4.pt
│ ├── rank_5.pt
│ ├── rank_6.pt
│ └── rank_7.pt
└── weight
├── STABLE
├── adapter_config.json
└── adapter_model.safetensors
The reason we want it this way is because runs appear and disappear dynamically for our hosted training deployment. And so the design is such that trainer is never the trigger for starting and stopping runs but instead just observing the outputs folder for any outputs/run_* folders. And if there is then it starts the run based on the outputs/run_x/checkpoints and outputs/run_x/control/orch.toml and then the trainer and orchestrator do this queue thing with each other where trainer reads rollouts and produces broadcasts while orchestrator does the inverse.
So in the SFT case, you would use the script to create the checkpoint that the trainer will read from.
In the RL case, orchestrator is in charge of creating the checkpoint and trainer is basically agnostic to where the checkpoint came from, whether it was a checkpoint created by training or a fresh checkpoint created by the conversion.

This PR adds a new RL warm-start capability for LoRA training.
With this change, RL can start from an existing saved LoRA adapter instead of always starting from a freshly initialized adapter. This makes it possible to:
What this adds
init_adapter_pathfor LoRA warm-startPreparedInitAdapterfor parsing, validation, normalization, and cached CPU tensorsValidation
Added:
Integration test convergence plots