-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[https://nvbugspro.nvidia.com/bug/5246419][fix] Align default setting & remove unnecessary check for chat and completion
#3888
opened Apr 27, 2025 by
LinPoly
Loading…
(draft)refactor: (part1) Add contraints doc for fusedMoe module.
#3882
opened Apr 26, 2025 by
HuiGao-NV
Loading…
feat: integrate modelopt hf export into our quantization script
#3881
opened Apr 26, 2025 by
hypdeb
Loading…
fix: Set num_microbatches=pp_size with overlap scheduler
#3878
opened Apr 26, 2025 by
amukkara
Loading…
fix: [https://nvbugspro.nvidia.com/bug/5242406][fix] Fix fp8 kvcache support
#3877
opened Apr 26, 2025 by
hlu1
Loading…
feat: Re-enable Llama4 fusion and add AllReduce CUDA Graph Fix
#3876
opened Apr 25, 2025 by
zihaok
Loading…
Draft: perf: [TRTLLM-4717][perf] Set CUDA graph max batch size and padding in throughput benchmark.
#3875
opened Apr 25, 2025 by
FrankD412
Loading…
feat: refactoring dataset generation and adding tests
others
#3866
opened Apr 25, 2025 by
hypdeb
Loading…
fix: Fix FMHA-based MLA in the generation phase and add MLA unit test
#3863
opened Apr 25, 2025 by
jinyangyuan-nvidia
Loading…
fix: [https://nvbugspro.nvidia.com/bug/5243482] If FlashMLA is used, the existence of FMHA based MLA kernels should not be checked.
#3862
opened Apr 25, 2025 by
bobboli
Loading…
infra: [TRTLLM-4475][TRTLLM-4565] Add pipeline hierarchy and basic info in the Jenkins job page
#3859
opened Apr 25, 2025 by
ZhanruiSunCh
•
Draft
feat: add health_generate route to openai serving
Community Engagement
Community want to contribute
#3856
opened Apr 25, 2025 by
dsingal0
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.