You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorRT-LLM import fix and aot_joint_export specify as explicit setting in dynamo.compile
TRT-LLM installation utilities and adding test cases
adding the option in _compiler.py
changes in the TRT-LLM loading tool- removing install_wget, install_unzip, install_mpi
Further changes in error logging of the TRT-LLM installation tool
moving the load_tensorrt_llm to dynamo/utils.py
correcting misprint for TRT LLM load
Using python lib for download to make it platform agnostic
dll file path update for windows
correcting the non critical lint error
Including version in versions.txt
linting error fixes and rebase fix
removing Platform enum from converter_utils.py
Addressing review comments- tmp dir for wheel download and wheel extraction, variable for py_version
checks for windows where NCCL backend is not supported
adding checks for windows and jetson devices
Keeping the extracted and deleting download file, restructuring test
modifying the error warning of missing libmpi libs
removing the redundant initializations
adding tests in CI
correcting the skip test condition
install MPI libs for linux x86
adding SBSA to the supported platform of TRT-LLM libs and installing MPI libs for the distributed tests
Using python package for platform detection
using python platform
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
178
179
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
180
+
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
179
181
**kwargs: Any,
180
182
Returns:
181
183
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
"""Compile an ExportedProgram module for NVIDIA GPUs using TensorRT
@@ -515,6 +519,7 @@ def compile(
515
519
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
516
520
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
517
521
offload_module_to_cpu (bool): Offload the module to CPU. This is useful when we need to minimize GPU memory usage.
522
+
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
518
523
**kwargs: Any,
519
524
Returns:
520
525
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
1116
1123
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
"TensorRT-LLM is not installed. Please install TensorRT-LLM or set TRTLLM_PLUGINS_PATH to the directory containing libnvinfer_plugin_tensorrt_llm.so to use converters for torch.distributed ops",
0 commit comments