[cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig by yifannnwu · Pull Request #5885 · verl-project/verl

yifannnwu · 2026-04-06T20:11:20Z

Summary

FSDPActorConfig.__post_init__ and FSDPCriticConfig.__post_init__ set self.engine = self.fsdp_config but never sync self.strategy to self.engine.strategy. Since EngineConfig.strategy defaults to None, engine_workers.py:162 always passes None as the backend to EngineRegistry.new(), which falls back to FSDP1 regardless of the user's actor.strategy setting.

This causes crashes for models that require FSDP2, such as Qwen3.5 and other models with multi-dimensional RoPE position_ids, where FSDP1's parameter wrapping breaks apply_rotary_pos_emb with shape mismatches.

Repro

Set actor.strategy=fsdp2 with use_legacy_worker_impl=disable (new engine_workers.py path)
Train any model
engine_workers.py reads engine_config.strategy → gets None → defaults to FSDP1
For models with multi-dimensional position_ids (Qwen3.5, Qwen3-VL), FSDP1 wrapping breaks apply_rotary_pos_emb

Fix

Sync strategy from the actor/critic config to the engine config in __post_init__:

object.__setattr__(self.engine, "strategy", self.strategy)

Uses object.__setattr__ because BaseConfig has frozen field logic that prevents normal attribute assignment.

Affected configs

FSDPActorConfig (verl/workers/config/actor.py)
FSDPCriticConfig (verl/workers/config/critic.py)

Note: McoreActorConfig, VeOmniActorConfig, TorchTitanActorConfig are not affected because their engine configs have matching hardcoded strategy defaults.

Impact

Affects all FSDP2 training using the new engine_workers.py path (use_legacy_worker_impl=disable). The legacy worker path is unaffected because it doesn't read engine_config.strategy.

Test plan

Verified engine_config.strategy == "fsdp2" when actor.strategy = "fsdp2" with use_legacy_worker_impl=disable
Trained Qwen3.5-0.8B with strategy=fsdp2 + use_legacy_worker_impl=disable — FSDP2 correctly applied, 3 training steps completed
Confirmed strategy=fsdp (default) is unchanged

FSDPActorConfig and FSDPCriticConfig set self.engine = self.fsdp_config but never sync self.strategy to self.engine.strategy. Since EngineConfig.strategy defaults to None, engine_workers.py (the new worker path used with use_legacy_worker_impl=disable) always falls back to FSDP1 regardless of the user's actor.strategy setting. This causes crashes for models that require FSDP2, such as Qwen3.5 and other models with multi-dimensional RoPE position_ids, where FSDP1's parameter wrapping breaks apply_rotary_pos_emb with shape mismatches. Fix: sync strategy in __post_init__ using object.__setattr__ (needed because BaseConfig has frozen field logic).

gemini-code-assist

Code Review

This pull request ensures that the FSDP strategy is correctly propagated to the engine configuration in both actor and critic workers, preventing an unintended fallback to FSDP1. A review comment suggests also syncing the ulysses_sequence_parallel_size in the critic configuration to maintain consistency and ensure sequence parallelism settings are properly applied.

gemini-code-assist · 2026-04-06T20:13:22Z

verl/workers/config/critic.py

+        # Sync strategy to engine config so engine_workers can pick the right FSDP version.
+        # EngineConfig.strategy defaults to None, so without this, engine_workers.py always
+        # falls back to FSDP1 even when critic.strategy="fsdp2".
+        object.__setattr__(self.engine, "strategy", self.strategy)


In addition to syncing the strategy, ulysses_sequence_parallel_size should also be synced to the engine configuration in FSDPCriticConfig for consistency and backward compatibility, similar to the implementation in FSDPActorConfig. Without this, sequence parallelism settings defined at the top level of the critic configuration will not propagate to the underlying FSDP engine.

Note that ulysses_sequence_parallel_size is already defined as a mutable field in FSDPEngineConfig, so direct assignment is permitted.

object.__setattr__(self.engine, "strategy", self.strategy) # backward compatibility if self.ulysses_sequence_parallel_size > 1: self.fsdp.ulysses_sequence_parallel_size = self.ulysses_sequence_parallel_size

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

wuxibin89 approved these changes Apr 7, 2026

View reviewed changes

wuxibin89 changed the title ~~fix: sync strategy from ActorConfig/CriticConfig to EngineConfig~~ [cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig Apr 7, 2026

wuxibin89 merged commit 74dc16c into verl-project:main Apr 7, 2026
59 of 69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig#5885

[cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig#5885
wuxibin89 merged 1 commit intoverl-project:mainfrom
yifannnwu:fix/sync-strategy-to-engine-config

yifannnwu commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yifannnwu commented Apr 6, 2026

Summary

Repro

Fix

Affected configs

Impact

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants