fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006
Conversation
…count flags _CLI_COMMAND_SENSITIVE_TOKENS previously included bare "token" as a substring marker. Combined with the substring-match in _redact_cli_args (`if any(tok in key for tok in ...)`), this caused every aiperf flag with "tokens" (plural) in its name -- the entire LLM token-count family -- to have its value silently replaced with <redacted> in the canonical cli_command string. Real-run output showed `--synthetic-input-tokens-mean '<redacted>'`, `--output-tokens-mean '<redacted>'`, etc., breaking cli_command's reproducibility purpose: you can't replay a benchmark whose configured token counts have been scrubbed. Replace the bare "token" with the specific auth-token compound forms the redaction was intended to catch (api-token, auth-token, access-token, bearer-token, id-token, refresh-token, with both - and _ variants). Each entry now contains at least one - or _, so substring-matching can't fire on innocent plurals. Tradeoff: bare --token <secret> is no longer caught. Aiperf has no such flag today; future additions can extend the list explicitly. Tests: - 6 parametrized regression cases for the token-count flag family (input/output/synthetic-input variants), asserting numeric values pass through unredacted. - 7 parametrized positive cases for the auth-token compound family, asserting their secret values are still redacted. The buggy substring match was introduced alongside the v2 config refactor (commit 94a9102 / PR #912). Test coverage at the time exercised --api-key only; the token-count family was uncovered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3fRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3fLast updated for commit: |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR expands the CLI token redaction patterns to recognize additional credential-shaped flag variants and adds regression tests validating both non-matching (plural token flags) and matching (auth-token compound) behaviors. ChangesToken Redaction Pattern Expansion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
Fixes a bug where every aiperf flag with
tokens(plural) in its name had itsvalue silently replaced with
<redacted>in the canonicalcli_commandstring,breaking reproducibility of the recorded launch command.
Before:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean ''
--synthetic-input-tokens-stddev ''
--output-tokens-mean '' ...
After:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean 1024
--synthetic-input-tokens-stddev 0
--output-tokens-mean 128 ...
Root cause
src/aiperf/common/redact.py's_CLI_COMMAND_SENSITIVE_TOKENSincluded bare"token"as a substring marker. Combined with the substring check in_redact_cli_args(if any(tok in key for tok in ...)), this fired on any flagcontaining the substring
token— including LLM token-count flags like--*-tokens-mean,--*-tokens-stddev.Introduced alongside the v2 config refactor in 94a9102 (#912). Test coverage
at the time exercised
--api-keyonly; the token-count family was uncovered.Fix
Replaced bare
"token"with the specific auth-token compound forms it wasmeant to catch:
api-token,auth-token,access-token,bearer-token,id-token,refresh-token(each with-and_variants). Every entry nowcontains a
-or_, so substring-matching can't fire on innocent plurals.Tradeoff: bare
--token <secret>is no longer caught. aiperf has no suchflag today; future additions can extend the list explicitly.
Test plan
uv run pytest tests/unit/common/test_redact.py— 271 passedcases):
--input-tokens-mean,--input-tokens-stddev,--output-tokens-mean,--output-tokens-stddev,--synthetic-input-tokens-mean,--synthetic-input-tokens-stddev--api-token,--api_token,--auth-token,--access-token,--bearer-token,--id-token,--refresh-tokenaiperf profile --model 'Qwen/Qwen3-0.6B' ... --synthetic-input-tokens-mean 1024 ...shows literal
1024(not<redacted>) in the printed CLI Command--api-key sk-secret-XXXredaction still worksRelated
🤖 Generated with Claude Code
Summary by CodeRabbit
--api-token,--auth-token,--access-token,--bearer-token,--id-token,--refresh-token), providing improved protection for sensitive information in CLI commands.