You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,10 @@ Model Optimizer Changelog (Linux)
9
9
10
10
- Add support for PyTorch Geometric quantization.
11
11
12
+
**Documentation**
13
+
14
+
- Deprecate ``examples/megatron-lm`` in favor of more detailed documentation in `Megatron-LM/examples/post_training/modelopt <https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt>`_.
15
+
12
16
**Misc**
13
17
14
18
- Bump minimum recommended transformers version to 4.53.
@@ -75,7 +79,7 @@ Model Optimizer Changelog (Linux)
75
79
- Upgrade TensorRT-LLM dependency to 1.1.0rc2.
76
80
- Support Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in ``examples/vlm_ptq``.
77
81
- Support storing and restoring Minitron pruning activations and scores for re-pruning without running the forward loop again.
78
-
- Add Minitron pruning example for Megatron-LM framework. See ``examples/megatron-lm`` for more details.
82
+
- Add Minitron pruning example for Megatron-LM framework. See `Megatron-LM/examples/post_training/modelopt <https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt>`_ for more details.
Copy file name to clipboardExpand all lines: examples/llm_distill/README.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,8 @@ This section focuses on demonstrating how to apply Model Optimizer to perform kn
13
13
| Pre-Requisites | Required & optional packages to use this technique |\[[Link](#pre-requisites)\]||
14
14
| Getting Started | Learn how to optimize your models using distillation to produce more intellegant smaller models |\[[Link](#getting-started)\]|\[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\]|
15
15
| Support Matrix | View the support matrix to see compatibility and feature availability across different models |\[[Link](#support-matrix)\]||
16
-
| Distillation with NeMo | Learn how to distill your models with NeMo Framework |\[[Link](#knowledge-distillation-kd-for-nvidia-nemo-models)\]|\[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\]|
16
+
| Distillation with Megatron-LM | Learn how to distill your models with Megatron-LM Framework |\[[Link](#knowledge-distillation-kd-in-nvidia-megatron-lm-framework)\]||
17
+
| Distillation with NeMo | Learn how to distill your models with NeMo Framework |\[[Link](#knowledge-distillation-kd-in-nvidia-nemo-framework)\]|\[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\]|
17
18
| Distillation with Huggingface | Learn how to distill your models with Hugging Face |\[[Link](#knowledge-distillation-kd-for-huggingface-models)\]|\[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\]|
18
19
| Resources | Extra links to relevant resources |\[[Link](#resources)\]||
19
20
| NeMo Prune + Distill Simplified Flow | Example script demonstrating end-to-end pruning plus distillation in NeMo |\[[Link](../nemo_run/prune_distill/README.md)\]||
@@ -25,7 +26,7 @@ This section focuses on demonstrating how to apply Model Optimizer to perform kn
25
26
### Docker
26
27
27
28
For Hugging Face models, please use the PyTorch docker image (e.g., `nvcr.io/nvidia/pytorch:25.06-py3`).
28
-
For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.07`) which has all the dependencies installed.
29
+
For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.09`) which has all the dependencies installed.
29
30
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
30
31
31
32
Also follow the installation steps below to upgrade to the latest version of Model Optimizer and install example-specific dependencies.
@@ -141,9 +142,13 @@ Loss balancers:
141
142
| Qwen 3 | qwen3 | ✅ |
142
143
| Mamba | mamba | ✅ |
143
144
144
-
## Knowledge Distillation (KD) for NVIDIA NeMo Models
145
+
## Knowledge Distillation (KD) in NVIDIA Megatron-LM Framework
145
146
146
-
Checkout the stand-alone distillation script in the [NVIDIA NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/distillation/distillation.html).
147
+
Checkout the Knowledge Distillation example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt).
148
+
149
+
## Knowledge Distillation (KD) in NVIDIA NeMo Framework
150
+
151
+
Checkout the stand-alone distillation script in the [NeMo documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/distillation/distillation.html).
147
152
148
153
You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/qwen/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Qwen 3 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
The exported checkpoint can be deployed using TensorRT-LLM/ vLLM/ SGLang. For more details refer to the [deployment section](#deployment) of this document.
0 commit comments