Skip to content

Inject witness ids into the Megatron forward and train step#1446

Open
fzyzcjy wants to merge 1 commit into
tom/pr_chain/trainer_ft/dev_revert_reversed/bracket-megatron-actor-methods-with-the-with-logs-decoratorfrom
tom/pr_chain/trainer_ft/dev_revert_reversed/inject-witness-ids-into-the-megatron-forward-and-train-step
Open

Inject witness ids into the Megatron forward and train step#1446
fzyzcjy wants to merge 1 commit into
tom/pr_chain/trainer_ft/dev_revert_reversed/bracket-megatron-actor-methods-with-the-with-logs-decoratorfrom
tom/pr_chain/trainer_ft/dev_revert_reversed/inject-witness-ids-into-the-megatron-forward-and-train-step

Conversation

@fzyzcjy

@fzyzcjy fzyzcjy commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Feeds witness ids through the Megatron forward path so per-sample witness tracing can run: includes witness_ids in the get_batch key lists, splats them into the model forward (forward_only and train_one_step), guards combined-1f1b against witness, and dumps/clears stale witness state after a NORMAL train step via witness_dump_and_clear_stale. The witness_info parameter threading and the witness allocator/module are added with their own features.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/bracket-megatron-actor-methods-with-the-with-logs-decorator branch from b1f9980 to 01273a5 Compare June 23, 2026 07:51
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/inject-witness-ids-into-the-megatron-forward-and-train-step branch from 5f53d76 to 3452825 Compare June 23, 2026 07:51
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/bracket-megatron-actor-methods-with-the-with-logs-decorator branch from 01273a5 to 58e1819 Compare June 23, 2026 09:29
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/inject-witness-ids-into-the-megatron-forward-and-train-step branch from 3452825 to fde57ba Compare June 23, 2026 09:29
Feeds witness ids through the Megatron forward path so per-sample witness tracing can run: includes witness_ids in the get_batch key lists, splats them into the model forward (forward_only and train_one_step), guards combined-1f1b against witness, and dumps/clears stale witness state after a NORMAL train step via witness_dump_and_clear_stale. The witness_info parameter threading and the witness allocator/module are added with their own features.
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/bracket-megatron-actor-methods-with-the-with-logs-decorator branch from 58e1819 to 46bf61a Compare June 23, 2026 13:34
@fzyzcjy fzyzcjy force-pushed the tom/pr_chain/trainer_ft/dev_revert_reversed/inject-witness-ids-into-the-megatron-forward-and-train-step branch from fde57ba to a19935b Compare June 23, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant