Skip to content

Commit 990b0c0

Browse files
authored
[TRTLLM-7159][docs] Add documentation for additional outputs (#8325)
Signed-off-by: Robin Kobus <[email protected]>
1 parent 8090c96 commit 990b0c0

File tree

3 files changed

+57
-0
lines changed

3 files changed

+57
-0
lines changed
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
(additional-outputs)=
2+
3+
# Additional Outputs
4+
5+
TensorRT LLM provides several options to return additional outputs from the model during inference. These options can be specified in the `SamplingParams` object and control what extra information is returned for each generated sequence.
6+
For an example showing how to set the parameters and how to access the results, see [examples/llm-api/quickstart_advanced.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llm-api/quickstart_advanced.py).
7+
8+
## Options
9+
10+
### `return_context_logits`
11+
12+
- **Description**: If set to `True`, the logits (raw model outputs before softmax) for the context (input prompt) tokens are returned for each sequence.
13+
- **Usage**: Useful for tasks such as scoring the likelihood of the input prompt or for advanced post-processing.
14+
- **Default**: `False`
15+
16+
### `return_generation_logits`
17+
18+
- **Description**: If set to `True`, the logits for the generated tokens (tokens produced during generation) are returned for each sequence.
19+
- **Usage**: Enables advanced sampling, custom decoding, or analysis of the model's output probabilities for generated tokens.
20+
- **Default**: `False`
21+
22+
### `prompt_logprobs`
23+
24+
- **Description**: If set to an integer value `N`, the top-`N` log probabilities for each prompt token are returned, along with the corresponding token IDs.
25+
- **Usage**: Useful for analyzing how likely the model considers each input token, scoring prompts, or for applications that require access to the token-level log probability of the prompt.
26+
- **Default**: `None`
27+
28+
### `logprobs`
29+
30+
- **Description**: If set to an integer value `N`, the top-`N` log probabilities for each generated token are returned, along with the corresponding token IDs.
31+
- **Usage**: Useful for uncertainty estimation, sampling analysis, or for applications that require access to the probability distribution over tokens at each generation step.
32+
- **Default**: `None` (no log probabilities returned)
33+
34+
### `additional_model_outputs`
35+
36+
- **Description**: Specifies extra outputs to return from the model during inference. This should be a list of strings, where each string corresponds to the name of a supported additional output (such as "hidden_states" or "attentions").
37+
- **Usage**: Allows retrieval of intermediate model results like hidden states, attentions, or any other auxiliary outputs supported by the model. This can be useful for debugging, interpretability, or advanced research applications.
38+
- **How to use**:
39+
- Provide a list of supported output names, e.g.:
40+
41+
```python
42+
additional_model_outputs=["hidden_states", "attentions"]
43+
```
44+
45+
- Pass this list to the `additional_model_outputs` parameter of `SamplingParams`.
46+
- After generation, access the results per sequence via `sequence.additional_context_outputs` (for context outputs)
47+
and `sequence.additional_generation_outputs` (for generation outputs).
48+
- **Default**: `None` (no additional outputs returned)
49+
50+
**Note:** The available output names depend on the model implementation. The model forward function is expected to return a dictionary of model outputs including the `"logits"` and any additional output that should be attached to responses.

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ Welcome to TensorRT LLM's Documentation!
7070
features/parallel-strategy.md
7171
features/quantization.md
7272
features/sampling.md
73+
features/additional-outputs.md
7374
features/speculative-decoding.md
7475
features/checkpoint-loading.md
7576
features/auto_deploy/auto-deploy.md

examples/llm-api/quickstart_advanced.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ def add_llm_args(parser):
155155
parser.add_argument('--return_generation_logits',
156156
default=False,
157157
action='store_true')
158+
parser.add_argument('--prompt_logprobs', default=False, action='store_true')
158159
parser.add_argument('--logprobs', default=False, action='store_true')
159160

160161
parser.add_argument('--additional_model_outputs',
@@ -283,6 +284,7 @@ def setup_llm(args, **kwargs):
283284
return_context_logits=args.return_context_logits,
284285
return_generation_logits=args.return_generation_logits,
285286
logprobs=args.logprobs,
287+
prompt_logprobs=args.prompt_logprobs,
286288
n=args.n,
287289
best_of=best_of,
288290
use_beam_search=use_beam_search,
@@ -323,6 +325,10 @@ def main():
323325
print(
324326
f"[{i}]{sequence_id_text} Generation logits: {sequence.generation_logits}"
325327
)
328+
if args.prompt_logprobs:
329+
print(
330+
f"[{i}]{sequence_id_text} Prompt logprobs: {sequence.prompt_logprobs}"
331+
)
326332
if args.logprobs:
327333
print(f"[{i}]{sequence_id_text} Logprobs: {sequence.logprobs}")
328334

0 commit comments

Comments
 (0)