Commit fdf9fb1
Add
* Add NVTE_KEEP_BACKWARD_UNQUANTIZED
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Disable ub and clean up
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Drop fuser changes
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Replace use_quantized_bwd with use_fp8_bwd
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Ignore keep_backward_unquantized if delayed scaling
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Refactor ignoring NVTE_KEEP_BACKWARD_UNQUANTIZED when delayed scaling is used
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Add back missing ctx.debug
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Refactor changes under fused
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Clean up
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Refactor high-precision overwrite if keep_backward_unquantized
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Clean up
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Drop redundant fp8_recipe_bwd
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Drop redundant ub changes
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Drop more redundant ub changes
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Drop redundant delayed scaling changes
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Drop unneeded backwards_needs_fc1_input
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Drop and disallow LayerNormMLP implementation
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Move interface changes to recipe
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Move ub overrides to fwd
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Remove duplication
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Simplify use_fp8_bwd logic in bwd
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Set grad quantizers to none if keep bwd unquantized
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Drop delayed scaling change
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Simplify env var logic
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Move validation check to recipe
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Simplify effective_enabled
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix inverted assertion logic
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Simplify changes under ops
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Simplify ctx.keep_backward_unquantized
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix missing attribute
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Add unit tests
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix bias errors in unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add more shapes to unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Refator interface to `NVTE_BACKWARD_MODE=default|unquant|dequant`
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix override and clean up
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Clean up unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Clean up unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Override `ctx.reduce_and_update_bwd_fp8_tensors = False`
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Expand unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Add `test_backward_mode_memory_peak_report`
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Expand test coverage and fix
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Use `numel()`
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Refactor unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix grouped linear to override `*_quantizers` instead of `*_quantizer`
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Only save input/weight when `*_requires_grad` on unquant mode
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix Blackwell debug ci
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix sm89 and sm90 tests
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Fix unquant mode memory saving
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Refactor interface to `NVTE_BACKWARD_OVERRIDE=high_precision|dequantized`
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Rename unit test
Signed-off-by: Ziang Li <ziangli@umich.edu>
* Simplify env var parsing
Signed-off-by: Ziang Li <ziangli@umich.edu>
---------
Signed-off-by: Ziang Li <ziangli@umich.edu>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Przemek Tredak <ptredak@nvidia.com>NVTE_BACKWARD_OVERRIDE=high_precision|dequantized (#2644)1 parent 5f9550f commit fdf9fb1
24 files changed
Lines changed: 2415 additions & 61 deletions
File tree
- qa/L0_pytorch_unittest
- tests/pytorch
- transformer_engine
- common/recipe
- pytorch
- module
- ops
- basic
- fused
- tensor/storage
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
0 commit comments