[trainer] feat: support BAGEL non-lora training with pickscore reward#218
[trainer] feat: support BAGEL non-lora training with pickscore reward#218zhtmike wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for non-LoRA full-weight training for the BAGEL model, enabling selective parameter freezing via a new configure_trainable_params hook and integrating FSDP2 to handle mixed parameter training. It also adds a new run script, updates documentation, and refactors weight loading and rotary embedding casting. The review feedback highlights a potential runtime TypeError and generator exhaustion issue in the weight-loading adapter, a precision degradation concern in the rotary embedding computation, and a suggestion to make the model path in the run script configurable via environment variables.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
code is ready to review; the wandb result is at https://wandb.ai/mikecheung/flow_grpo/runs/ybv9jf3y, I will update the PR's content once it nearly finish training |
|
there is a slightly model degration compared with lora, let me check first |
What does this PR do?
Support bagel non-lora training.
run_bagel_pickscore.shfor BAGEL non-LoRA FlowGRPO trainingDiffusionModelBase.configure_trainable_paramshook for non-LoRArequires_gradcontrolmoe_genonly, cast to fp32configure_trainable_paramsin engine whenlora_rank=0, before FSDP wrappingtransformer.*weights throughlanguage_model.load_weightsfor qkv fusioncos/sindtype mismatchNonDiffusersModelBasemetaclass to explicitABCMetaChecklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,vllm_omni,rollout,trainer,ci,training_utils,recipe,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,diffusion,omni,tests,docker,like[diffusion, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][diffusion, fsdp] feat: new rollout schedulerTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always