PyTorch 2.9, CUDA 13.0 TensorRT 10.13, Python 3.13
Torch-TensorRT 2.9.0 Linux x86-64 and Windows targets PyTorch 2.9, TensorRT 10.13, CUDA 13.0, 12.8, 12.6 and Python 3.10 ~ 3.13
Python
x86-64 Linux and Windows
-
CUDA 13.0 + Python 3.10-3.13 is Available via PyPI
- https://pypi.org/project/torch-tensorrt/ -
CUDA 12.6/12.8/13.0 + Python 3.10-3.13 is also Available via Pytorch Index
- https://download.pytorch.org/whl/torch-tensorrt
aarch64 SBSA Linux and Jetson Thor
- CUDA 13.0 + Python 3.10–3.13 + Torch 2.9 + TensorRT 10.13 (Python 3.12 is the only version verified for Thor)
- Available via PyPI: https://pypi.org/project/torch-tensorrt/
- Available via PyTorch index: https://download.pytorch.org/whl/torch-tensorrt
NOTE: You must explicitly install TensorRT or use system installed TensorRT wheels for aarch64 platforms
uv pip install torch torch-tensorrt tensorrt
aarch64 Jetson Orin
- no torch_tensorrt 2.9 release for Jetson Orin, please continue using torch_tensorrt 2.8 release
C++
x86-64 Linux and Windows
- CUDA 13.0 Tarball / Zip
Deprecations
FX Frontend
The FX frontend was the precursor to the Dynamo frontend and a number of Dynamo components were shared between the two. Now that the Dynamo frontend is stable and all shared components have been decoupled we will no longer ship the FX frontend in binary releases starting in H1Y26. The FX frontend will remain in the source tree for the foreseeable future so source builds can reinstall the frontend if necessary.
New Features
LLM and VLM improvements
In this release, we’ve introduced several key enhancements:
- Sliding Window Attention in SDPA Converter : Added support for sliding window attention, enabling successful compilation of the Gemma3 model (Gemma3-1B).
- Dynamic Custom Lowering Passes
Refactored the lowering framework to allow users to dynamically register custom passes based on the configuration of Hugging Face models. - Vision-Language Model (VLM) Support
- Added support for Eagle2 and Qwen2.5-VL models via the new run_vlm.py utility.
- run_vlm.py enables compilation of both vision and language components of a VLM model. It also supports KV caching for efficient VLM generation.
See the documentation for detailed instructions on running these models.
TensorRT-RTX
TensorRT-RTX is a JIT-first version of TensorRT. Where as TensorRT will perform tactic selection and fusions during a build phase. TensorRT-RTX allows you to distribute builds prior to specializing for specific hardware so that one GPU agnostic package can be distributed to all users of your builds. Then on first use, TensorRT RTX will tune for the specific hardware your users are running. Torch-TensorRT-RTX is a build of Torch-TensorRT that uses the TensorRT-RTX compiler stack inplace of standard TensorRT. All APIs are identical to Torch-TensorRT, however, some features such as weak-typing and at compile time post training quantization are not supported.
- Added exprimental support for Torch-TensorRT-RTX
- You can check out the details on how to build and run here: https://docs.pytorch.org/TensorRT/getting_started/tensorrt_rtx.html
Improvements
- Closed a number of performance gaps between Torch-TensorRT and ONNX TensorRT constructed graphs
What's Changed
- fix the broken CC0 image link by @lanluo-nvidia in #3635
- upgrade torch_tensorrt version from 2.8.0.dev to 2.9.0.dev by @lanluo-nvidia in #3639
- Temporary fix to workaround the mutable decomposition error. by @lanluo-nvidia in #3636
- Fix dynamo core test failure on Windows by @HolyWu in #3642
- Closed the perf gap of resnet and enabled refit by @cehongwang in #3629
- feat: Refactor LLM model zoo and add KV cache support by @peri044 in #3527
- adding rotary embedding example, with graph rewrite for complex subgraph by @apbose in #3570
- feat: Add bf16 support to cast converter by @peri044 in #3643
- fix: replace add_identity by add_cast for type cast by @junstar92 in #3563
- Refit debug patch by @cehongwang in #3620
- fix compiler cl not found error in windows by @lanluo-nvidia in #3660
- slice scatter support for dynamic cases by @apbose in #3513
- fix the int8 quantization failure error by @lanluo-nvidia in #3663
- chore(deps): bump transformers from 4.48.0 to 4.52.1 in /tests/modules by @dependabot[bot] in #3670
- chore(deps): bump transformers from 4.50.0 to 4.51.0 in /examples/dynamo by @dependabot[bot] in #3669
- chore(deps): bump transformers from 4.49.0 to 4.51.0 in /tests/py by @dependabot[bot] in #3668
- remove tensorrt as build dependency by @lanluo-nvidia in #3681
- disable jetpack build for now by @lanluo-nvidia in #3685
- Fixed the CI problem by @cehongwang in #3680
- fix windows build failure: add /utf-8 by @lanluo-nvidia in #3684
- upgrade tensorrt from 10.11 to 10.12 by @lanluo-nvidia in #3686
- Add Flux fp4 support by @lanluo-nvidia in #3689
- feat: revert linear converter by @zewenli98 in #3703
- Fixed python only runtime bug by @cehongwang in #3701
- Disabled silu decomposition cast by @cehongwang in #3677
- Jetson distributed fix by @apbose in #3716
- Simplify the Group Norm converter by @zewenli98 in #3719
- fix conv1d/deconv1d bug with stride more than 1 by @lanluo-nvidia in #3737
- add test cases for strong typing by @lanluo-nvidia in #3739
- Upgrade perf_run script to support TRT 10 and fix some issues by @zewenli98 in #3650
- Fixed SDPA slow down and linear slow down by @cehongwang in #3700
- remove breakpoint() by @lanluo-nvidia in #3750
- add nvshmem in aarch64 by @lanluo-nvidia in #3769
- chore(deps): bump transformers from 4.51.3 to 4.53.0 in /tools/perf by @dependabot[bot] in #3754
- Cherry pick jetson enablement from 2.8 release branch to main by @lanluo-nvidia in #3765
- Breaking Change: Remove the deprecated int8 calibrator related by @lanluo-nvidia in #3759
- fix the typo by @lanluo-nvidia in #3773
- Removal of BAZEL build files from python package and changes to make cpp tests work by @apbose in #3641
- fix: atan2 strong type support & bug fix for integer dynamic shape by @chohk88 in #3751
- upgrade torchvision from 0.23.0 to 0.24.0 by @lanluo-nvidia in #3772
- chore: update resources in README.md by @peri044 in #3780
- disable python 3.14 in CI by @lanluo-nvidia in #3787
- fix: set example models to eval mode and follow the convention by @zewenli98 in #3770
- fix: prelu perf gap on Unet by @zewenli98 in #3717
- fix: batch norm issue encountered in RAFT by @zewenli98 in #3758
- feat: Add support for Groot N1.5 model by @peri044 in #3736
- skip flashinfer test due to torch upstream change by @lanluo-nvidia in #3794
- Add support for TensorRT-RTX by @lanluo-nvidia in #3753
- add fx deprecation notice + jetpack doc update by @lanluo-nvidia in #3795
- addressing ngc aarch64 error by @apbose in #3705
- fix pybind issue in windows by @lanluo-nvidia in #3801
- llm: register sdpa variant by @lanluo-nvidia in #3802
- fix bazel build //tests/core/runtime:runtime_tests issue by @lanluo-nvidia in #3804
- Simplify Release workflow and Add windows zip in the release artifacts by @lanluo-nvidia in #3800
- change llm model test from gemma3 to qwen to skip auth by @lanluo-nvidia in #3807
- replace allow_complex_guards_as_runtime_assertswithprefer_deferred_ru… by @lanluo-nvidia in #3809
- cherry pick 25.09 skip test to main by @lanluo-nvidia in #3810
- feat: support dynamics for all inputs for embedding_bag converter by @zewenli98 in #3796
- cherry pick is_thor from ngc/release/25.09 branch to main by @lanluo-nvidia in #3813
- dlfw related changes by @lanluo-nvidia in #3814
- fix guard_fn issue by @lanluo-nvidia in #3815
- Same changes as #3812 chore: Add more models for benchmark and polish codes by @zewenli98 in #3822
- Index converter dynamic cases fix by @apbose in #3694
- add lowering pass to converter test by @lanluo-nvidia in #3820
- Lluo/modelopt import restructure by @lanluo-nvidia in #3825
- integrated vlm code for benchmark for Eagle2 by @chohk88 in #3698
- chore: Upgrade TRT to 10.13.2.6 by @peri044 in #3791
- enable cu130 by @lanluo-nvidia in #3808
- release 2.9 branch cut by @lanluo-nvidia in #3828
- cherry pick 3833 fix ci issue: from main to release/2.9 branch by @lanluo-nvidia in #3834
- cherry pick of bug fix: #3837 by @peri044 in #3838
- debug windows issue in release 2.9 by @lanluo-nvidia in #3836
- fix thor tensorrt dependency issue by @lanluo-nvidia in #3843
- fix test by @lanluo-nvidia in #3852
- Lluo/cherry pick moe by @lanluo-nvidia in #3853
- fix pkg_zip nested zip issue by @lanluo-nvidia in #3861
Full Changelog: v2.8.0...v2.9.0