Skip to content

Commit fdf9fb1

Browse files
zianglihpre-commit-ci[bot]ksivamanptrendx
authored
Add NVTE_BACKWARD_OVERRIDE=high_precision|dequantized (#2644)
* Add NVTE_KEEP_BACKWARD_UNQUANTIZED Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Disable ub and clean up Signed-off-by: Ziang Li <ziangli@umich.edu> * Drop fuser changes Signed-off-by: Ziang Li <ziangli@umich.edu> * Replace use_quantized_bwd with use_fp8_bwd Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Ignore keep_backward_unquantized if delayed scaling Signed-off-by: Ziang Li <ziangli@umich.edu> * Refactor ignoring NVTE_KEEP_BACKWARD_UNQUANTIZED when delayed scaling is used Signed-off-by: Ziang Li <ziangli@umich.edu> * Add back missing ctx.debug Signed-off-by: Ziang Li <ziangli@umich.edu> * Refactor changes under fused Signed-off-by: Ziang Li <ziangli@umich.edu> * Clean up Signed-off-by: Ziang Li <ziangli@umich.edu> * Refactor high-precision overwrite if keep_backward_unquantized Signed-off-by: Ziang Li <ziangli@umich.edu> * Clean up Signed-off-by: Ziang Li <ziangli@umich.edu> * Drop redundant fp8_recipe_bwd Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Drop redundant ub changes Signed-off-by: Ziang Li <ziangli@umich.edu> * Drop more redundant ub changes Signed-off-by: Ziang Li <ziangli@umich.edu> * Drop redundant delayed scaling changes Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Drop unneeded backwards_needs_fc1_input Signed-off-by: Ziang Li <ziangli@umich.edu> * Drop and disallow LayerNormMLP implementation Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move interface changes to recipe Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move ub overrides to fwd Signed-off-by: Ziang Li <ziangli@umich.edu> * Remove duplication Signed-off-by: Ziang Li <ziangli@umich.edu> * Simplify use_fp8_bwd logic in bwd Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Set grad quantizers to none if keep bwd unquantized Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Drop delayed scaling change Signed-off-by: Ziang Li <ziangli@umich.edu> * Simplify env var logic Signed-off-by: Ziang Li <ziangli@umich.edu> * Move validation check to recipe Signed-off-by: Ziang Li <ziangli@umich.edu> * Simplify effective_enabled Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix inverted assertion logic Signed-off-by: Ziang Li <ziangli@umich.edu> * Simplify changes under ops Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify ctx.keep_backward_unquantized Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix missing attribute Signed-off-by: Ziang Li <ziangli@umich.edu> * Add unit tests Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix bias errors in unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add more shapes to unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * Refator interface to `NVTE_BACKWARD_MODE=default|unquant|dequant` Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix override and clean up Signed-off-by: Ziang Li <ziangli@umich.edu> * Clean up unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * Clean up unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * Override `ctx.reduce_and_update_bwd_fp8_tensors = False` Signed-off-by: Ziang Li <ziangli@umich.edu> * Expand unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * Add `test_backward_mode_memory_peak_report` Signed-off-by: Ziang Li <ziangli@umich.edu> * Expand test coverage and fix Signed-off-by: Ziang Li <ziangli@umich.edu> * Use `numel()` Signed-off-by: Ziang Li <ziangli@umich.edu> * Refactor unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix grouped linear to override `*_quantizers` instead of `*_quantizer` Signed-off-by: Ziang Li <ziangli@umich.edu> * Only save input/weight when `*_requires_grad` on unquant mode Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix Blackwell debug ci Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix sm89 and sm90 tests Signed-off-by: Ziang Li <ziangli@umich.edu> * Fix unquant mode memory saving Signed-off-by: Ziang Li <ziangli@umich.edu> * Refactor interface to `NVTE_BACKWARD_OVERRIDE=high_precision|dequantized` Signed-off-by: Ziang Li <ziangli@umich.edu> * Rename unit test Signed-off-by: Ziang Li <ziangli@umich.edu> * Simplify env var parsing Signed-off-by: Ziang Li <ziangli@umich.edu> --------- Signed-off-by: Ziang Li <ziangli@umich.edu> Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Przemek Tredak <ptredak@nvidia.com>
1 parent 5f9550f commit fdf9fb1

24 files changed

Lines changed: 2415 additions & 61 deletions

qa/L0_pytorch_unittest/test.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_gqa.xml $TE_PATH
4242
python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_fused_optimizer.xml $TE_PATH/tests/pytorch/test_fused_optimizer.py || test_fail "test_fused_optimizer.py"
4343
python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_multi_tensor.xml $TE_PATH/tests/pytorch/test_multi_tensor.py || test_fail "test_multi_tensor.py"
4444
NVTE_CUTEDSL_FUSED_GROUPED_MLP=1 python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_fusible_ops.xml $TE_PATH/tests/pytorch/test_fusible_ops.py || test_fail "test_fusible_ops.py"
45+
python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_backward_override.xml $TE_PATH/tests/pytorch/test_backward_override.py || test_fail "test_backward_override.py"
4546
python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_permutation.xml $TE_PATH/tests/pytorch/test_permutation.py || test_fail "test_permutation.py"
4647
python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_parallel_cross_entropy.xml $TE_PATH/tests/pytorch/test_parallel_cross_entropy.py || test_fail "test_parallel_cross_entropy.py"
4748
python3 -m pytest --tb=auto --junitxml=$XML_LOG_DIR/pytest_test_cpu_offloading.xml $TE_PATH/tests/pytorch/test_cpu_offloading.py || test_fail "test_cpu_offloading.py"

0 commit comments

Comments
 (0)