[megatron] fix: always patch actor postprocess on unfused path for MTP models#5845
[megatron] fix: always patch actor postprocess on unfused path for MTP models#5845AkiRusProd wants to merge 2 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the initialization of the Megatron actor by moving the patch_postprocess call outside of the mtp_config conditional block, applying it to all models in the actor module. A review comment identifies a potential risk where non-MTP models might encounter an AttributeError because the patch assumes the presence of specific configuration attributes; it suggests wrapping the patch call in a conditional check for mtp_num_layers.
| from verl.models.mcore.mtp_patch import patch_postprocess | ||
|
|
||
| for model in self.actor_module: | ||
| patch_postprocess(model) |
There was a problem hiding this comment.
Unconditionally patching _postprocess for all models in the unfused path introduces a maintenance risk and potential runtime errors. The patch replaces a native Megatron method with a hardcoded version that assumes the existence of mtp_num_layers in the model config. If a standard (non-MTP) model is used, especially with an older Megatron version where this attribute is missing, it will cause an AttributeError during the forward pass. It is safer to only apply the patch if the model is actually MTP-capable (i.e., mtp_num_layers > 0).
| patch_postprocess(model) | |
| if getattr(get_model_config(model), 'mtp_num_layers', 0) > 0: | |
| patch_postprocess(model) |
There was a problem hiding this comment.
In verl/models/mcore/config_converter.py
transformer_config.mtp_num_layers = hf_config.num_nextn_predict_layers
That is, mtp_num_layers = 0 => gemini's proposal is meaningless, the patch will not apply.
What does this PR do?
This fixes a crash on the unfused Megatron actor path by moving
patch_postprocess(model)out ofif self.mtp_configand applying it unconditionally.Why is this needed?
With the previous logic,
patch_postprocess(model)was skipped wheneverself.mtp_configwas not set for the current actor path.In practice, this could still lead to the model reaching Megatron's default
_postprocessin forward-only/log-prob passes and crashing whenlabelsisNone, for example:Root cause
The previous condition was too narrow: it tied
patch_postprocess(model)to actor-side runtimeself.mtp_config, even though the patch is still needed on the unfused path where the unpatched Megatron_postprocesscan be reached.Notes
This PR is intentionally minimal and only moves
patch_postprocess(model)out of theif self.mtp_configblock.