[megatron] fix: always patch actor postprocess on unfused path for MTP models by AkiRusProd · Pull Request #5845 · verl-project/verl

AkiRusProd · 2026-04-01T09:51:16Z

What does this PR do?

This fixes a crash on the unfused Megatron actor path by moving patch_postprocess(model) out of if self.mtp_config and applying it unconditionally.

Why is this needed?

With the previous logic, patch_postprocess(model) was skipped whenever self.mtp_config was not set for the current actor path.

In practice, this could still lead to the model reaching Megatron's default _postprocess in forward-only/log-prob passes and crashing when labels is None, for example:

File "/usr/local/lib/python3.10/dist-packages/megatron/core/models/gpt/gpt_model.py", line 612, in _postprocess
    mtp_labels = labels.clone()
AttributeError: 'NoneType' object has no attribute 'clone'

Root cause

The previous condition was too narrow: it tied patch_postprocess(model) to actor-side runtime self.mtp_config, even though the patch is still needed on the unfused path where the unpatched Megatron _postprocess can be reached.

Notes

This PR is intentionally minimal and only moves patch_postprocess(model) out of the if self.mtp_config block.

gemini-code-assist

Code Review

This pull request modifies the initialization of the Megatron actor by moving the patch_postprocess call outside of the mtp_config conditional block, applying it to all models in the actor module. A review comment identifies a potential risk where non-MTP models might encounter an AttributeError because the patch assumes the presence of specific configuration attributes; it suggests wrapping the patch call in a conditional check for mtp_num_layers.

gemini-code-assist · 2026-04-01T09:57:21Z

verl/workers/actor/megatron_actor.py

            from verl.models.mcore.mtp_patch import patch_postprocess

            for model in self.actor_module:
+                patch_postprocess(model)


Unconditionally patching _postprocess for all models in the unfused path introduces a maintenance risk and potential runtime errors. The patch replaces a native Megatron method with a hardcoded version that assumes the existence of mtp_num_layers in the model config. If a standard (non-MTP) model is used, especially with an older Megatron version where this attribute is missing, it will cause an AttributeError during the forward pass. It is safer to only apply the patch if the model is actually MTP-capable (i.e., mtp_num_layers > 0).

Suggested change

patch_postprocess(model)

if getattr(get_model_config(model), 'mtp_num_layers', 0) > 0:

patch_postprocess(model)

I will check it

In verl/models/mcore/config_converter.py
transformer_config.mtp_num_layers = hf_config.num_nextn_predict_layers
That is, mtp_num_layers = 0 => gemini's proposal is meaningless, the patch will not apply.

Fix Megatron actor MTP postprocess patching

e78a586

AkiRusProd requested review from ISEEKYAN and vermouth1992 as code owners April 1, 2026 09:51

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

Patch Megatron engine postprocess on unfused path

2e42612

AkiRusProd requested a review from eric-haibin-lin as a code owner April 1, 2026 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] fix: always patch actor postprocess on unfused path for MTP models#5845

[megatron] fix: always patch actor postprocess on unfused path for MTP models#5845
AkiRusProd wants to merge 2 commits intoverl-project:mainfrom
AkiRusProd:fix/megatron-actor-mtp-postprocess

AkiRusProd commented Apr 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

AkiRusProd Apr 1, 2026

Uh oh!

AkiRusProd Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	patch_postprocess(model)
	if getattr(get_model_config(model), 'mtp_num_layers', 0) > 0:
	patch_postprocess(model)

Conversation

AkiRusProd commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is this needed?

Root cause

Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

AkiRusProd Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

AkiRusProd Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AkiRusProd commented Apr 1, 2026 •

edited

Loading