Skip to content

Add FT test-action hooks to the train group#1448

Open
fzyzcjy wants to merge 1 commit into
tom/pr_chain/trainer_ft/dev_revert_reversed/log-train-group-step-end-and-analysis-eventsfrom
tom/pr_chain/trainer_ft/dev_revert_reversed/add-ft-test-action-hooks-to-the-train-group
Open

Add FT test-action hooks to the train group#1448
fzyzcjy wants to merge 1 commit into
tom/pr_chain/trainer_ft/dev_revert_reversed/log-train-group-step-end-and-analysis-eventsfrom
tom/pr_chain/trainer_ft/dev_revert_reversed/add-ft-test-action-hooks-to-the-train-group

Conversation

@fzyzcjy

@fzyzcjy fzyzcjy commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Wires FTTestActionGroupExecutor into RayTrainGroup: built from args in init and invoked via run_after_step(rollout_id) at the end of each train() step, so CI scenarios can drive group-level fault actions (e.g. stop/start cells) at deterministic rollouts. The actor-side executor and the shared ft_test_actions module are handled separately.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/log-train-group-step-end-and-analysis-events branch from 3ea44a1 to 5e07ca5 Compare June 23, 2026 07:51
@fzyzcjy fzyzcjy requested a review from yushengsu-thu as a code owner June 23, 2026 07:51
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/add-ft-test-action-hooks-to-the-train-group branch from ed04895 to 20e80c1 Compare June 23, 2026 07:51
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/log-train-group-step-end-and-analysis-events branch from 5e07ca5 to 2afae2a Compare June 23, 2026 09:30
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/add-ft-test-action-hooks-to-the-train-group branch from 20e80c1 to fcef2b4 Compare June 23, 2026 09:30
Wires FTTestActionGroupExecutor into RayTrainGroup: built from args in __init__ and invoked via run_after_step(rollout_id) at the end of each train() step, so CI scenarios can drive group-level fault actions (e.g. stop/start cells) at deterministic rollouts. The actor-side executor and the shared ft_test_actions module are handled separately.
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/log-train-group-step-end-and-analysis-events branch from 2afae2a to f8190b4 Compare June 23, 2026 13:34
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/add-ft-test-action-hooks-to-the-train-group branch from fcef2b4 to 319df37 Compare June 23, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant