Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions verl/workers/actor/megatron_actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,10 @@ def __init__(
from verl.models.mcore.mtp_patch import patch_postprocess

for model in self.actor_module:
patch_postprocess(model)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Unconditionally patching _postprocess for all models in the unfused path introduces a maintenance risk and potential runtime errors. The patch replaces a native Megatron method with a hardcoded version that assumes the existence of mtp_num_layers in the model config. If a standard (non-MTP) model is used, especially with an older Megatron version where this attribute is missing, it will cause an AttributeError during the forward pass. It is safer to only apply the patch if the model is actually MTP-capable (i.e., mtp_num_layers > 0).

Suggested change
patch_postprocess(model)
if getattr(get_model_config(model), 'mtp_num_layers', 0) > 0:
patch_postprocess(model)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In verl/models/mcore/config_converter.py
transformer_config.mtp_num_layers = hf_config.num_nextn_predict_layers
That is, mtp_num_layers = 0 => gemini's proposal is meaningless, the patch will not apply.

if self.mtp_config:
from verl.models.mcore.mtp_patch import patch_mtp_layer_get_embeddings

patch_postprocess(model)

if self.mtp_config.detach_encoder:
patch_mtp_layer_get_embeddings(model)

Expand Down