NVIDIA
diff --git a/‎.github/CODEOWNERS‎
Lines changed: 0 additions & 1 deletion b/‎.github/CODEOWNERS‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎CHANGELOG.rst‎
Lines changed: 5 additions & 1 deletion b/‎CHANGELOG.rst‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎examples/llm_distill/README.md‎
Lines changed: 9 additions & 4 deletions b/‎examples/llm_distill/README.md‎
Lines changed: 9 additions & 4 deletions
diff --git a/‎examples/llm_ptq/README.md‎
Lines changed: 2 additions & 2 deletions b/‎examples/llm_ptq/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/megatron-lm/ADVANCED.md‎
Lines changed: 0 additions & 50 deletions b/‎examples/megatron-lm/ADVANCED.md‎
Lines changed: 0 additions & 50 deletions
diff --git a/‎examples/megatron-lm/Dockerfile‎
Lines changed: 0 additions & 20 deletions b/‎examples/megatron-lm/Dockerfile‎
Lines changed: 0 additions & 20 deletions
diff --git a/‎examples/megatron-lm/README.md‎
Lines changed: 0 additions & 180 deletions b/‎examples/megatron-lm/README.md‎
Lines changed: 0 additions & 180 deletions
diff --git a/‎examples/megatron-lm/config/moonshotai/kimi_k2_instruct.sh‎
Lines changed: 0 additions & 21 deletions b/‎examples/megatron-lm/config/moonshotai/kimi_k2_instruct.sh‎
Lines changed: 0 additions & 21 deletions
diff --git a/‎examples/megatron-lm/config/moonshotai/kimi_k2_instruct_export.sh‎
Lines changed: 0 additions & 29 deletions b/‎examples/megatron-lm/config/moonshotai/kimi_k2_instruct_export.sh‎
Lines changed: 0 additions & 29 deletions
@@ -44,7 +44,6 @@ modelopt/torch/utils @NVIDIA/modelopt-torch-utils-codeowners
 /examples/llm_ptq @NVIDIA/modelopt-examples-llm_ptq-codeowners
 /examples/llm_qat @NVIDIA/modelopt-examples-llm_qat-codeowners
 /examples/llm_sparsity @NVIDIA/modelopt-torch-sparsity-codeowners
-/examples/megatron-lm @NVIDIA/modelopt-examples-megatron-codeowners
 /examples/model_hub @NVIDIA/modelopt-examples-model_hub-codeowners
 /examples/nemo_run @NVIDIA/modelopt-examples-megatron-codeowners
 /examples/onnx_ptq @NVIDIA/modelopt-onnx-codeowners
 
@@ -9,6 +9,10 @@ Model Optimizer Changelog (Linux)
 
 - Add support for PyTorch Geometric quantization.
 
+**Documentation**
+
+- Deprecate ``examples/megatron-lm`` in favor of more detailed documentation in `Megatron-LM/examples/post_training/modelopt <https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt>`_.
+
 **Misc**
 
 - Bump minimum recommended transformers version to 4.53.
@@ -75,7 +79,7 @@ Model Optimizer Changelog (Linux)
 - Upgrade TensorRT-LLM dependency to 1.1.0rc2.
 - Support Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in ``examples/vlm_ptq``.
 - Support storing and restoring Minitron pruning activations and scores for re-pruning without running the forward loop again.
-- Add Minitron pruning example for Megatron-LM framework. See ``examples/megatron-lm`` for more details.
+- Add Minitron pruning example for Megatron-LM framework. See `Megatron-LM/examples/post_training/modelopt <https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt>`_ for more details.
 
 0.35 (2025-09-04)
 ^^^^^^^^^^^^^^^^^
 
@@ -13,7 +13,8 @@ This section focuses on demonstrating how to apply Model Optimizer to perform kn
 | Pre-Requisites | Required & optional packages to use this technique | \[[Link](#pre-requisites)\] | |
 | Getting Started | Learn how to optimize your models using distillation to produce more intellegant smaller models | \[[Link](#getting-started)\] | \[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\] |
 | Support Matrix | View the support matrix to see compatibility and feature availability across different models | \[[Link](#support-matrix)\] | |
-| Distillation with NeMo | Learn how to distill your models with NeMo Framework | \[[Link](#knowledge-distillation-kd-for-nvidia-nemo-models)\] | \[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\] |
+| Distillation with Megatron-LM | Learn how to distill your models with Megatron-LM Framework | \[[Link](#knowledge-distillation-kd-in-nvidia-megatron-lm-framework)\] | |
+| Distillation with NeMo | Learn how to distill your models with NeMo Framework | \[[Link](#knowledge-distillation-kd-in-nvidia-nemo-framework)\] | \[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\] |
 | Distillation with Huggingface | Learn how to distill your models with Hugging Face | \[[Link](#knowledge-distillation-kd-for-huggingface-models)\] | \[[docs](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/4_distillation.html)\] |
 | Resources | Extra links to relevant resources | \[[Link](#resources)\] | |
 | NeMo Prune + Distill Simplified Flow | Example script demonstrating end-to-end pruning plus distillation in NeMo | \[[Link](../nemo_run/prune_distill/README.md)\] | |
@@ -25,7 +26,7 @@ This section focuses on demonstrating how to apply Model Optimizer to perform kn
 ### Docker
 
 For Hugging Face models, please use the PyTorch docker image (e.g., `nvcr.io/nvidia/pytorch:25.06-py3`).
-For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.07`) which has all the dependencies installed.
+For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.09`) which has all the dependencies installed.
 Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
 
 Also follow the installation steps below to upgrade to the latest version of Model Optimizer and install example-specific dependencies.
@@ -141,9 +142,13 @@ Loss balancers:
 | Qwen 3 | qwen3 | ✅ |
 | Mamba | mamba | ✅ |
 
-## Knowledge Distillation (KD) for NVIDIA NeMo Models
+## Knowledge Distillation (KD) in NVIDIA Megatron-LM Framework
 
-Checkout the stand-alone distillation script in the [NVIDIA NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/distillation/distillation.html).
+Checkout the Knowledge Distillation example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt).
+
+## Knowledge Distillation (KD) in NVIDIA NeMo Framework
+
+Checkout the stand-alone distillation script in the [NeMo documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/distillation/distillation.html).
 
 You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/qwen/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Qwen 3 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
 
 
@@ -28,7 +28,7 @@ This section focuses on Post-training quantization, a technique that reduces mod
 ### Docker
 
 For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2`).
-For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.07`).
+For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.09`).
 Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
 
 Also follow the installation steps below to upgrade to the latest version of Model Optimizer and install example-specific dependencies.
@@ -260,7 +260,7 @@ accelerate launch --config_file fsdp2.yaml \
     --calib_size <num_calib_samples> \
     --dataset <dataset> \
     --export_path <export_path> \
-    --trust_remote_code 
+    --trust_remote_code
 ```
 
 The exported checkpoint can be deployed using TensorRT-LLM/ vLLM/ SGLang. For more details refer to the [deployment section](#deployment) of this document.