[TRTLLM-7159][docs] Add documentation for additional outputs (#8325)

Funatiq · web-flow · commit 990b0c0c479b · 2025-10-27T09:52:04.000+01:00
Signed-off-by: Robin Kobus &lt;19427718+Funatiq@users.noreply.github.com&gt;
diff --git a/docs/source/features/additional-outputs.md b/docs/source/features/additional-outputs.md
@@ -0,0 +1,50 @@
+(additional-outputs)=
+
+# Additional Outputs
+
+TensorRT LLM provides several options to return additional outputs from the model during inference. These options can be specified in the `SamplingParams` object and control what extra information is returned for each generated sequence.
+For an example showing how to set the parameters and how to access the results, see [examples/llm-api/quickstart_advanced.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llm-api/quickstart_advanced.py).
+
+## Options
+
+### `return_context_logits`
+
+- **Description**: If set to `True`, the logits (raw model outputs before softmax) for the context (input prompt) tokens are returned for each sequence.
+- **Usage**: Useful for tasks such as scoring the likelihood of the input prompt or for advanced post-processing.
+- **Default**: `False`
+
+### `return_generation_logits`
+
+- **Description**: If set to `True`, the logits for the generated tokens (tokens produced during generation) are returned for each sequence.
+- **Usage**: Enables advanced sampling, custom decoding, or analysis of the model's output probabilities for generated tokens.
+- **Default**: `False`
+
+### `prompt_logprobs`
+
+- **Description**: If set to an integer value `N`, the top-`N` log probabilities for each prompt token are returned, along with the corresponding token IDs.
+- **Usage**: Useful for analyzing how likely the model considers each input token, scoring prompts, or for applications that require access to the token-level log probability of the prompt.
+- **Default**: `None`
+
+### `logprobs`
+
+- **Description**: If set to an integer value `N`, the top-`N` log probabilities for each generated token are returned, along with the corresponding token IDs.
+- **Usage**: Useful for uncertainty estimation, sampling analysis, or for applications that require access to the probability distribution over tokens at each generation step.
+- **Default**: `None` (no log probabilities returned)
+
+### `additional_model_outputs`
+
+- **Description**: Specifies extra outputs to return from the model during inference. This should be a list of strings, where each string corresponds to the name of a supported additional output (such as "hidden_states" or "attentions").
+- **Usage**: Allows retrieval of intermediate model results like hidden states, attentions, or any other auxiliary outputs supported by the model. This can be useful for debugging, interpretability, or advanced research applications.
+- **How to use**:
+  - Provide a list of supported output names, e.g.:
+
+    ```python
+    additional_model_outputs=["hidden_states", "attentions"]
+    ```
+
+  - Pass this list to the `additional_model_outputs` parameter of `SamplingParams`.
+  - After generation, access the results per sequence via `sequence.additional_context_outputs` (for context outputs)
+  and `sequence.additional_generation_outputs` (for generation outputs).
+- **Default**: `None` (no additional outputs returned)
+
+**Note:** The available output names depend on the model implementation. The model forward function is expected to return a dictionary of model outputs including the `"logits"` and any additional output that should be attached to responses.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -70,6 +70,7 @@ Welcome to TensorRT LLM's Documentation!
    features/parallel-strategy.md
    features/quantization.md
    features/sampling.md
+   features/additional-outputs.md
    features/speculative-decoding.md
    features/checkpoint-loading.md
    features/auto_deploy/auto-deploy.md
diff --git a/examples/llm-api/quickstart_advanced.py b/examples/llm-api/quickstart_advanced.py
@@ -155,6 +155,7 @@ def add_llm_args(parser):
     parser.add_argument('--return_generation_logits',
                         default=False,
                         action='store_true')
+    parser.add_argument('--prompt_logprobs', default=False, action='store_true')
     parser.add_argument('--logprobs', default=False, action='store_true')
 
     parser.add_argument('--additional_model_outputs',
@@ -283,6 +284,7 @@ def setup_llm(args, **kwargs):
         return_context_logits=args.return_context_logits,
         return_generation_logits=args.return_generation_logits,
         logprobs=args.logprobs,
+        prompt_logprobs=args.prompt_logprobs,
         n=args.n,
         best_of=best_of,
         use_beam_search=use_beam_search,
@@ -323,6 +325,10 @@ def main():
                 print(
                     f"[{i}]{sequence_id_text} Generation logits: {sequence.generation_logits}"
                 )
+            if args.prompt_logprobs:
+                print(
+                    f"[{i}]{sequence_id_text} Prompt logprobs: {sequence.prompt_logprobs}"
+                )
             if args.logprobs:
                 print(f"[{i}]{sequence_id_text} Logprobs: {sequence.logprobs}")