|
| 1 | +(additional-outputs)= |
| 2 | + |
| 3 | +# Additional Outputs |
| 4 | + |
| 5 | +TensorRT LLM provides several options to return additional outputs from the model during inference. These options can be specified in the `SamplingParams` object and control what extra information is returned for each generated sequence. |
| 6 | +For an example showing how to set the parameters and how to access the results, see [examples/llm-api/quickstart_advanced.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llm-api/quickstart_advanced.py). |
| 7 | + |
| 8 | +## Options |
| 9 | + |
| 10 | +### `return_context_logits` |
| 11 | + |
| 12 | +- **Description**: If set to `True`, the logits (raw model outputs before softmax) for the context (input prompt) tokens are returned for each sequence. |
| 13 | +- **Usage**: Useful for tasks such as scoring the likelihood of the input prompt or for advanced post-processing. |
| 14 | +- **Default**: `False` |
| 15 | + |
| 16 | +### `return_generation_logits` |
| 17 | + |
| 18 | +- **Description**: If set to `True`, the logits for the generated tokens (tokens produced during generation) are returned for each sequence. |
| 19 | +- **Usage**: Enables advanced sampling, custom decoding, or analysis of the model's output probabilities for generated tokens. |
| 20 | +- **Default**: `False` |
| 21 | + |
| 22 | +### `prompt_logprobs` |
| 23 | + |
| 24 | +- **Description**: If set to an integer value `N`, the top-`N` log probabilities for each prompt token are returned, along with the corresponding token IDs. |
| 25 | +- **Usage**: Useful for analyzing how likely the model considers each input token, scoring prompts, or for applications that require access to the token-level log probability of the prompt. |
| 26 | +- **Default**: `None` |
| 27 | + |
| 28 | +### `logprobs` |
| 29 | + |
| 30 | +- **Description**: If set to an integer value `N`, the top-`N` log probabilities for each generated token are returned, along with the corresponding token IDs. |
| 31 | +- **Usage**: Useful for uncertainty estimation, sampling analysis, or for applications that require access to the probability distribution over tokens at each generation step. |
| 32 | +- **Default**: `None` (no log probabilities returned) |
| 33 | + |
| 34 | +### `additional_model_outputs` |
| 35 | + |
| 36 | +- **Description**: Specifies extra outputs to return from the model during inference. This should be a list of strings, where each string corresponds to the name of a supported additional output (such as "hidden_states" or "attentions"). |
| 37 | +- **Usage**: Allows retrieval of intermediate model results like hidden states, attentions, or any other auxiliary outputs supported by the model. This can be useful for debugging, interpretability, or advanced research applications. |
| 38 | +- **How to use**: |
| 39 | + - Provide a list of supported output names, e.g.: |
| 40 | + |
| 41 | + ```python |
| 42 | + additional_model_outputs=["hidden_states", "attentions"] |
| 43 | + ``` |
| 44 | + |
| 45 | + - Pass this list to the `additional_model_outputs` parameter of `SamplingParams`. |
| 46 | + - After generation, access the results per sequence via `sequence.additional_context_outputs` (for context outputs) |
| 47 | + and `sequence.additional_generation_outputs` (for generation outputs). |
| 48 | +- **Default**: `None` (no additional outputs returned) |
| 49 | + |
| 50 | +**Note:** The available output names depend on the model implementation. The model forward function is expected to return a dictionary of model outputs including the `"logits"` and any additional output that should be attached to responses. |
0 commit comments