Startup: add argument-consistency checks & summary table (Fixes #124) #409
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds a lightweight validation layer and a configuration summary printed at startup, inspired by GPT-NeoX, resolving Issue #124.
Key features
megatron/arguments.py
_validate_and_summarize_args(args)
— runs sanity checks:hidden_size % num_attention_heads == 0
global_batch_size % data_parallel_size == 0
pad_vocab_size_to
(if set) divisible by TP sizeValueError
if any rule fails, aborting early before costly init.Why it matters
Early mis-configs (e.g., mismatched hidden/head sizes or bad batch divisibility) now surface instantly, saving hours of debugging and wasted GPU time.
Testing
pytest -q tests
— all existing tests pass.pretrain_gpt_tiny.sh
on 1 GPU and 4 GPU runs; summary appears once on rank 0.hidden_size
(not divisible by heads) — run aborts immediately with clear error.Backward compatibility
Purely additive logging/validation. No impact on training logic or performance.
Fixes #124