Skip to content

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006

Merged
matthewkotila merged 1 commit into
mainfrom
mkotila/redact-token-count-flags
May 28, 2026
Merged

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006
matthewkotila merged 1 commit into
mainfrom
mkotila/redact-token-count-flags

Conversation

@matthewkotila
Copy link
Copy Markdown
Contributor

@matthewkotila matthewkotila commented May 27, 2026

Summary

Fixes a bug where every aiperf flag with tokens (plural) in its name had its
value silently replaced with <redacted> in the canonical cli_command string,
breaking reproducibility of the recorded launch command.

Before:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean ''
--synthetic-input-tokens-stddev ''
--output-tokens-mean '' ...

After:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean 1024
--synthetic-input-tokens-stddev 0
--output-tokens-mean 128 ...

Root cause

src/aiperf/common/redact.py's _CLI_COMMAND_SENSITIVE_TOKENS included bare
"token" as a substring marker. Combined with the substring check in
_redact_cli_args (if any(tok in key for tok in ...)), this fired on any flag
containing the substring token — including LLM token-count flags like
--*-tokens-mean, --*-tokens-stddev.

Introduced alongside the v2 config refactor in 94a9102 (#912). Test coverage
at the time exercised --api-key only; the token-count family was uncovered.

Fix

Replaced bare "token" with the specific auth-token compound forms it was
meant to catch: api-token, auth-token, access-token, bearer-token,
id-token, refresh-token (each with - and _ variants). Every entry now
contains a - or _, so substring-matching can't fire on innocent plurals.

Tradeoff: bare --token <secret> is no longer caught. aiperf has no such
flag today; future additions can extend the list explicitly.

Test plan

  • uv run pytest tests/unit/common/test_redact.py — 271 passed
  • Token-count flag family preserves values (new parametrized regression
    cases): --input-tokens-mean, --input-tokens-stddev,
    --output-tokens-mean, --output-tokens-stddev,
    --synthetic-input-tokens-mean, --synthetic-input-tokens-stddev
  • Auth-token compound flags still redact (new parametrized positive cases):
    --api-token, --api_token, --auth-token, --access-token,
    --bearer-token, --id-token, --refresh-token
  • Original bug reproducer: aiperf profile --model 'Qwen/Qwen3-0.6B' ... --synthetic-input-tokens-mean 1024 ...
    shows literal 1024 (not <redacted>) in the printed CLI Command
  • Pre-existing --api-key sk-secret-XXX redaction still works

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced credential redaction to cover additional authentication token flag variants (--api-token, --auth-token, --access-token, --bearer-token, --id-token, --refresh-token), providing improved protection for sensitive information in CLI commands.

Review Change Stack

…count flags

_CLI_COMMAND_SENSITIVE_TOKENS previously included bare "token" as a
substring marker. Combined with the substring-match in
_redact_cli_args (`if any(tok in key for tok in ...)`), this caused
every aiperf flag with "tokens" (plural) in its name -- the entire
LLM token-count family -- to have its value silently replaced with
<redacted> in the canonical cli_command string.

Real-run output showed `--synthetic-input-tokens-mean '<redacted>'`,
`--output-tokens-mean '<redacted>'`, etc., breaking cli_command's
reproducibility purpose: you can't replay a benchmark whose configured
token counts have been scrubbed.

Replace the bare "token" with the specific auth-token compound forms
the redaction was intended to catch (api-token, auth-token,
access-token, bearer-token, id-token, refresh-token, with both - and
_ variants). Each entry now contains at least one - or _, so
substring-matching can't fire on innocent plurals.

Tradeoff: bare --token <secret> is no longer caught. Aiperf has no
such flag today; future additions can extend the list explicitly.

Tests:
- 6 parametrized regression cases for the token-count flag family
  (input/output/synthetic-input variants), asserting numeric values
  pass through unredacted.
- 7 parametrized positive cases for the auth-token compound family,
  asserting their secret values are still redacted.

The buggy substring match was introduced alongside the v2 config
refactor (commit 94a9102 / PR #912). Test coverage at the time
exercised --api-key only; the token-count family was uncovered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@matthewkotila matthewkotila requested a review from FrankD412 May 27, 2026 21:54
@github-actions github-actions Bot added the fix label May 27, 2026
@github-actions
Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3f

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3f

Last updated for commit: cf88051Browse code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d94b4e30-e9e2-47b2-be3b-5ba8e963dcfb

📥 Commits

Reviewing files that changed from the base of the PR and between d797629 and cf88051.

📒 Files selected for processing (2)
  • src/aiperf/common/redact.py
  • tests/unit/common/test_redact.py

Walkthrough

This PR expands the CLI token redaction patterns to recognize additional credential-shaped flag variants and adds regression tests validating both non-matching (plural token flags) and matching (auth-token compound) behaviors.

Changes

Token Redaction Pattern Expansion

Layer / File(s) Summary
Token pattern expansion and validation
src/aiperf/common/redact.py, tests/unit/common/test_redact.py
_CLI_COMMAND_SENSITIVE_TOKENS expands from basic patterns (api-key, authorization, token) to include compound auth/credential variants (api-token, auth-token, access-token, bearer-token, id-token, refresh-token) with clarifying comments on matching constraints. Tests verify plural token-count flags preserve their values while auth-token compound variants are fully redacted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A token by any other name is still a secret,
So we expanded the patterns to catch each culprit.
Auth-tokens, bearer-tokens, all must be concealed,
While token counts merrily stay revealed!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: tightening the sensitive-token list to prevent matching LLM token-count flags, which aligns with the core bug fix in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@matthewkotila matthewkotila merged commit f93b2ad into main May 28, 2026
29 of 30 checks passed
@matthewkotila matthewkotila deleted the mkotila/redact-token-count-flags branch May 28, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants