Skip to content

[megatron] fix: always patch actor postprocess on unfused path for MTP models#5845

Open
AkiRusProd wants to merge 2 commits intoverl-project:mainfrom
AkiRusProd:fix/megatron-actor-mtp-postprocess
Open

[megatron] fix: always patch actor postprocess on unfused path for MTP models#5845
AkiRusProd wants to merge 2 commits intoverl-project:mainfrom
AkiRusProd:fix/megatron-actor-mtp-postprocess

Conversation

@AkiRusProd
Copy link
Copy Markdown

@AkiRusProd AkiRusProd commented Apr 1, 2026

What does this PR do?

This fixes a crash on the unfused Megatron actor path by moving patch_postprocess(model) out of if self.mtp_config and applying it unconditionally.

Why is this needed?

With the previous logic, patch_postprocess(model) was skipped whenever self.mtp_config was not set for the current actor path.

In practice, this could still lead to the model reaching Megatron's default _postprocess in forward-only/log-prob passes and crashing when labels is None, for example:

File "/usr/local/lib/python3.10/dist-packages/megatron/core/models/gpt/gpt_model.py", line 612, in _postprocess
    mtp_labels = labels.clone()
AttributeError: 'NoneType' object has no attribute 'clone'

Root cause

The previous condition was too narrow: it tied patch_postprocess(model) to actor-side runtime self.mtp_config, even though the patch is still needed on the unfused path where the unpatched Megatron _postprocess can be reached.

Notes

This PR is intentionally minimal and only moves patch_postprocess(model) out of the if self.mtp_config block.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the initialization of the Megatron actor by moving the patch_postprocess call outside of the mtp_config conditional block, applying it to all models in the actor module. A review comment identifies a potential risk where non-MTP models might encounter an AttributeError because the patch assumes the presence of specific configuration attributes; it suggests wrapping the patch call in a conditional check for mtp_num_layers.

from verl.models.mcore.mtp_patch import patch_postprocess

for model in self.actor_module:
patch_postprocess(model)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Unconditionally patching _postprocess for all models in the unfused path introduces a maintenance risk and potential runtime errors. The patch replaces a native Megatron method with a hardcoded version that assumes the existence of mtp_num_layers in the model config. If a standard (non-MTP) model is used, especially with an older Megatron version where this attribute is missing, it will cause an AttributeError during the forward pass. It is safer to only apply the patch if the model is actually MTP-capable (i.e., mtp_num_layers > 0).

Suggested change
patch_postprocess(model)
if getattr(get_model_config(model), 'mtp_num_layers', 0) > 0:
patch_postprocess(model)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In verl/models/mcore/config_converter.py
transformer_config.mtp_num_layers = hf_config.num_nextn_predict_layers
That is, mtp_num_layers = 0 => gemini's proposal is meaningless, the patch will not apply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant