Skip to content

Conversation

@Superjomn
Copy link
Collaborator

@Superjomn Superjomn commented Oct 16, 2025

Summary

  1. Bring GC tracer utilities to tensorrt_llm._utils and create a tracer in LLM class to track GC in LLM proxy
  2. Add more NVTX events to the RPC path
  3. Some performance enhancement on RPC, Llama 3.1 8B FP8 TP1&TP2 got around 2% gap compared to existing IPC path on H100 HBM3, refer to this worknote for details
  4. Some robustness enhancement on RPC

The GC tracer works in each process:
image

Summary by CodeRabbit

  • New Features

    • Added environment variable support (TLLM_ORCHESTRATOR_TYPE) for orchestrator type configuration, enabling dynamic selection between RPC and Ray orchestrators.
  • Tests

    • Enabled RPC executor tests, including multi-GPU tensor parallelism and streaming scenarios for validation.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@Superjomn Superjomn requested a review from a team as a code owner October 16, 2025 05:58
@Superjomn Superjomn requested a review from pcastonguay October 16, 2025 05:58
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 16, 2025

📝 Walkthrough

Walkthrough

This PR introduces environment variable override support for orchestrator type configuration, refactors executor creation into factory methods for improved modularity, and enables two previously skipped RPC-related tests. The changes consolidate executor instantiation logic and allow orchestrator type to be set via the TLLM_ORCHESTRATOR_TYPE environment variable.

Changes

Cohort / File(s) Summary
Executor factory methods
tensorrt_llm/executor/executor.py
Adds two new static factory methods to GenerationExecutor: _create_rpc_executor() constructs RPC executor instances, and _create_ipc_executor() constructs either worker or proxy IPC executors based on configuration. Refactors inline executor creation logic into reusable helpers.
Orchestrator configuration
tensorrt_llm/llmapi/llm_args.py
Adds field validator for orchestrator_type that reads environment variable via orchestrator_type_env(), validates the value is 'rpc' or 'ray', and substitutes the environment value when present.
Configuration utilities
tensorrt_llm/llmapi/utils.py
Adds new public utility function orchestrator_type_env() that reads and returns the TLLM_ORCHESTRATOR_TYPE environment variable as an optional string.
Test enablement
tests/unittest/llmapi/test_llm_multi_gpu_pytorch.py, tests/unittest/llmapi/test_llm_pytorch.py
Removes skip decorators from four RPC-related tests: test_llm_rpc, test_llm_rpc_streaming, test_llm_rpc_tp2, and test_llm_rpc_streaming_tp2, allowing them to execute.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes involve mixed concerns across multiple files: understanding the factory method extraction pattern and its integration with existing executor paths, following the environment variable override flow through the validator, and contextualizing why the previously skipped tests are now enabled. While individual changes are straightforward, coherent review requires tracing the orchestrator type flow and validating the factory methods properly replace inline instantiation logic.

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ⚠️ Warning The PR title claims to "Optimize perf for the RPC executor and add some profile utilities to llm-api," but the actual changes reveal a different focus. The modifications introduce factory methods to consolidate RPC and IPC executor creation logic in executor.py, add environment variable configuration support for orchestrator_type in llm_args.py and utils.py, and enable previously skipped tests. The refactoring improves modularity and code organization rather than directly optimizing performance, and no profile utilities are added—only environment variable configuration accessors. The title mischaracterizes the primary intent and content of the changeset. Consider revising the title to accurately reflect the actual changes, such as "Refactor executor creation with factory methods and add orchestrator_type environment variable support" or similar wording that captures the modularity improvements and configuration enhancements without overstating performance optimization or profile utilities that are not present in the changeset.
Description check ⚠️ Warning The pull request description is largely incomplete and does not follow the required template. While a brief summary of changes is provided in the first section (GC tracer utilities, NVTX events, performance and robustness enhancements for RPC), the critical sections 'Description' and 'Test Coverage' remain empty or unfilled. The PR checklist is marked as complete, but without substantive details in the Description section explaining the issue and solution, and without clearly listing relevant test coverage, the PR fails to meet the template requirements. Fill in the 'Description' section with a clear explanation of the issue being addressed and how the solution resolves it. Additionally, complete the 'Test Coverage' section by listing the relevant test cases that validate the changes, such as the RPC-related tests that were unskipped (test_llm_rpc, test_llm_rpc_streaming, test_llm_rpc_tp2, test_llm_rpc_streaming_tp2) and any new tests for the GC tracer utilities and factory methods.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tensorrt_llm/llmapi/utils.py (1)

359-361: Add a docstring for better documentation.

For consistency with similar utility functions in this file (e.g., enable_llmapi_debug() at lines 351-356), consider adding a docstring to document the purpose and behavior of this function.

Apply this diff to add a docstring:

 def orchestrator_type_env() -> Optional[str]:
+    """Read orchestrator type from the TLLM_ORCHESTRATOR_TYPE environment variable.
+    
+    Returns:
+        Optional[str]: The orchestrator type ('rpc' or 'ray') if set, None otherwise.
+    """
     return os.environ.get("TLLM_ORCHESTRATOR_TYPE", None)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee588a7 and 77d7d74.

📒 Files selected for processing (5)
  • tensorrt_llm/executor/executor.py (5 hunks)
  • tensorrt_llm/llmapi/llm_args.py (2 hunks)
  • tensorrt_llm/llmapi/utils.py (1 hunks)
  • tests/unittest/llmapi/test_llm_multi_gpu_pytorch.py (0 hunks)
  • tests/unittest/llmapi/test_llm_pytorch.py (0 hunks)
💤 Files with no reviewable changes (2)
  • tests/unittest/llmapi/test_llm_pytorch.py
  • tests/unittest/llmapi/test_llm_multi_gpu_pytorch.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/llmapi/utils.py
  • tensorrt_llm/llmapi/llm_args.py
  • tensorrt_llm/executor/executor.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/llmapi/utils.py
  • tensorrt_llm/llmapi/llm_args.py
  • tensorrt_llm/executor/executor.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/llmapi/utils.py
  • tensorrt_llm/llmapi/llm_args.py
  • tensorrt_llm/executor/executor.py
🧬 Code graph analysis (2)
tensorrt_llm/llmapi/llm_args.py (1)
tensorrt_llm/llmapi/utils.py (1)
  • orchestrator_type_env (359-360)
tensorrt_llm/executor/executor.py (7)
tensorrt_llm/llmapi/mpi_session.py (1)
  • MpiSession (84-129)
tensorrt_llm/executor/postproc_worker.py (1)
  • PostprocWorkerConfig (42-49)
tensorrt_llm/llmapi/llm_args.py (1)
  • KvCacheConnectorConfig (454-466)
tensorrt_llm/executor/rpc_proxy.py (1)
  • GenerationExecutorRpcProxy (24-375)
tensorrt_llm/executor/worker.py (1)
  • GenerationExecutorWorker (41-229)
tensorrt_llm/executor/proxy.py (1)
  • GenerationExecutorProxy (37-454)
tensorrt_llm/executor/utils.py (1)
  • ProcessPoolExecutorSession (80-104)
🪛 Ruff (0.14.0)
tensorrt_llm/llmapi/llm_args.py

2036-2038: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (9)
tensorrt_llm/llmapi/llm_args.py (2)

24-24: LGTM!

The import is correctly placed and necessary for the new validator below.


2029-2041: Missing @classmethod decorator causes validator to fail.

The field validator is missing the @classmethod decorator, which is required by Pydantic. Without it, the validator will receive incorrect arguments and fail at runtime.

Apply this diff to fix the issue:

 @field_validator('orchestrator_type', mode='before')
+@classmethod
-def validate_orchestrator_config(v):
+def validate_orchestrator_config(cls, v):
     # The environment variable will override the orchestrator_type field.
     # TODO: remove the environment variable after RPC path is stable, then
     # there will be only two stable options: None(RPC) and 'ray'.
     if (ev := orchestrator_type_env()) is not None:
         if ev not in ['rpc', 'ray']:
             raise ValueError(
                 f"Invalid orchestrator type: {ev}. Please set orchestrator_type to 'rpc' or 'ray'."
             )
         v = ev

     return v

Note: The static analysis hint about the long error message (TRY003) is a minor style concern and can be addressed separately if desired.

Likely an incorrect or invalid review comment.

tensorrt_llm/executor/executor.py (7)

378-395: LGTM! Clean refactoring into factory method.

The _create_rpc_executor static method consolidates RPC executor instantiation logic, improving code maintainability by eliminating duplicate creation code across multiple call sites.


396-426: LGTM! Well-designed factory method with clear branching.

The _create_ipc_executor static method effectively consolidates IPC-based executor creation with a clear use_worker parameter to toggle between single-process Worker and multi-process Proxy modes. The docstring clearly explains the parameter's behavior.


485-487: LGTM! Correctly excludes "rpc" from unsupported orchestrators.

The condition now explicitly allows orchestrator_type == "rpc" to bypass the unsupported orchestrator error, preserving the RPC execution path as intended by this PR.


493-506: LGTM! Clear routing logic with improved readability.

The orchestrator_is_rpc boolean flag improves code clarity by consolidating the orchestrator type check. The conditional routing correctly calls the appropriate factory method based on the orchestrator type.


525-541: LGTM! Correct factory method usage for single-process path.

The refactored code correctly routes to RPC executor when needed, and uses _create_ipc_executor with use_worker=True to create a single-process GenerationExecutorWorker for the TP1 optimization path. The pattern is consistent and correct.


548-564: LGTM! Consistent factory method usage for non-Windows streaming path.

The refactored code maintains consistent routing patterns. The use_worker=False parameter correctly creates a multi-process GenerationExecutorProxy for streaming performance, and mpi_session=None allows the executor to use mpi4py internally.


571-578: LGTM! Windows-specific executor creation with known limitation.

The refactored code correctly uses _create_ipc_executor with use_worker=False and ProcessPoolExecutorSession for Windows compatibility (since mpi4py cannot be used). Note the TODO comment at line 570 indicating that RPC worker support on Windows is planned future work.

@Superjomn Superjomn force-pushed the perf-tuning branch 2 times, most recently from 14303d0 to 3e6d5a5 Compare October 16, 2025 08:12
@Superjomn Superjomn removed the request for review from pcastonguay October 19, 2025 03:32
@Superjomn Superjomn marked this pull request as draft October 19, 2025 03:32
@Superjomn Superjomn force-pushed the perf-tuning branch 4 times, most recently from 26477c1 to 2bcafb3 Compare October 26, 2025 05:09
@Superjomn Superjomn marked this pull request as ready for review October 26, 2025 05:12
@Superjomn Superjomn requested review from a team as code owners October 26, 2025 05:12
@Superjomn
Copy link
Collaborator Author

/bot run

@Superjomn Superjomn requested a review from hchings October 26, 2025 05:13
@Superjomn Superjomn changed the title [None][chore] Align RPC performance with existing GenerationExecutor [None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api Oct 26, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #22522 [ run ] triggered by Bot. Commit: 2bcafb3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22522 [ run ] completed with state SUCCESS. Commit: 2bcafb3
/LLM/main/L0_MergeRequest_PR pipeline #16978 completed with status: 'FAILURE'

@Superjomn
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22623 [ run ] triggered by Bot. Commit: 6b56dac

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22623 [ run ] completed with state SUCCESS. Commit: 6b56dac
/LLM/main/L0_MergeRequest_PR pipeline #17053 completed with status: 'FAILURE'

@Superjomn
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22708 [ run ] triggered by Bot. Commit: 6b56dac

@Superjomn Superjomn requested a review from kaiyux October 28, 2025 02:01
Copy link
Collaborator

@hchings hchings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified functionality (not perf) locally

@hchings hchings enabled auto-merge (squash) November 3, 2025 19:46
@hchings
Copy link
Collaborator

hchings commented Nov 3, 2025

/bot run

@hchings
Copy link
Collaborator

hchings commented Nov 3, 2025

/bot kill

@hchings
Copy link
Collaborator

hchings commented Nov 3, 2025

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23415 [ run ] triggered by Bot. Commit: d6c9a23

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23416 [ kill ] triggered by Bot. Commit: d6c9a23

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23415 [ run ] completed with state ABORTED. Commit: d6c9a23

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23417 [ reuse-pipeline ] triggered by Bot. Commit: d6c9a23

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23416 [ kill ] completed with state SUCCESS. Commit: d6c9a23
Successfully killed previous jobs for commit d6c9a23

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23417 [ reuse-pipeline ] completed with state SUCCESS. Commit: d6c9a23
Can't reuse PR_Github #23415 with status: ABORTED

@hchings
Copy link
Collaborator

hchings commented Nov 3, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23423 [ run ] triggered by Bot. Commit: d6c9a23

@Superjomn
Copy link
Collaborator Author

/bot skip --comment "the CI has passed #8415 (comment)"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23439 [ skip ] triggered by Bot. Commit: d6c9a23

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23423 [ run ] completed with state ABORTED. Commit: d6c9a23
LLM/main/L0_MergeRequest_PR #17639 (Blue Ocean) completed with status: ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23439 [ skip ] completed with state SUCCESS. Commit: d6c9a23
Skipping testing for commit d6c9a23

@hchings hchings merged commit ed297d7 into NVIDIA:main Nov 4, 2025
5 checks passed
fredricz-20070104 pushed a commit to fredricz-20070104/TensorRT-LLM that referenced this pull request Nov 5, 2025
… utilities to llm-api (NVIDIA#8415)

Signed-off-by: Superjomn <[email protected]>
Signed-off-by: FredricZ-2007 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants