TensorRT-LLM Release 0.17.0 #2726
zeroepoch
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
We are very pleased to announce the 0.17.0 version of TensorRT-LLM. This update includes:
Key Features and Enhancements
LLM
API andtrtllm-bench
command.tensorrt_llm._torch
. The following is a list of supported infrastructure, models, and features that can be used with the PyTorch workflow.LLM
API.examples/multimodal/README.md
.userbuffer
based AllReduce-Norm fusion kernel.executor
API.API Changes
paged_context_fmha
is enabled.--concurrency
support for thethroughput
subcommand oftrtllm-bench
.Fixed Issues
cluster_key
for auto parallelism feature. ([feature request] Can we add H200 in infer_cluster_key() method? #2552)__post_init__
function ofLLmArgs
Class. Thanks for the contribution from @topenkoff in Fix kwarg name #2691.Infrastructure Changes
nvcr.io/nvidia/pytorch:25.01-py3
.nvcr.io/nvidia/tritonserver:25.01-py3
.Known Issues
--extra-index-url https://pypi.nvidia.com
when runningpip install tensorrt-llm
due to new third-party dependencies.Beta Was this translation helpful? Give feedback.
All reactions