Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 247d06784d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| b' ' | b'/' | ||
| | b':' | ||
| | b'.' | ||
| | b'@' |
There was a problem hiding this comment.
Detect dotted high-entropy tokens as secrets
The looks_like_secret heuristic unconditionally rejects any token containing . (b'.' in the denylist), which means JWT/OAuth bearer tokens like header.payload.signature are never redacted even when they are long and high-entropy. Because diagnostics export includes logs and session data where Authorization: Bearer ... values can appear, this leaves a common secret format exposed in the zip and undermines the masking goal.
Useful? React with 👍 / 👎.
| let name = path.file_name().unwrap().to_str().unwrap(); | ||
| zip.start_file(format!("logs/{}", name), options)?; | ||
| zip.write_all(&fs::read(&path)?)?; | ||
| let content = fs::read_to_string(&path).unwrap_or_default(); |
There was a problem hiding this comment.
Avoid writing empty files when reads fail
Using fs::read_to_string(&path).unwrap_or_default() after zip.start_file(...) silently turns I/O/decoding failures into empty diagnostic files, so support receives a zip with blank logs/*.jsonl (and similarly config.yaml) instead of either real content or a visible error. This is a regression from the prior behavior that surfaced read failures, and it can hide the root cause during incident debugging.
Useful? React with 👍 / 👎.
The diagnostics bundle (zip) included config.yaml, logs, and session data as raw text. If a user accidentally stored an API key in config.yaml (or if one appeared in log output), it would be shipped verbatim — and users often attach these bundles to public GitHub issues. Fix: run all text content through a redact_secrets() pass before writing to the zip. Detection uses Shannon entropy + character-composition heuristics — secrets (API keys, tokens) are long, high-entropy strings made of alphanumeric chars and hyphens, while normal values (URLs, model names, descriptions) contain structural characters like slashes, dots, spaces, and colons. JWT-shaped tokens (three dotted base64 segments) are also detected. The approach is intentionally format-agnostic: it handles YAML key-value lines, JSON quoted strings, and multi-word values (e.g. 'Bearer <token>') without needing to know specific key names. Signed-off-by: Michael Neale <[email protected]>
247d067 to
e10b323
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e10b323023
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| publish: | ||
| name: Publish to npm | ||
| if: github.event_name != 'workflow_dispatch' | ||
| runs-on: ubuntu-latest | ||
| needs: [build] |
There was a problem hiding this comment.
Keep publish job gated from workflow_dispatch runs
Removing the if: github.event_name != 'workflow_dispatch' guard causes manual workflow_dispatch executions to run the production publish job (pnpm publish --provenance) instead of staying build-only. That changes dry-run behavior into a real publish path, so an operator-triggered verification run can now either publish packages unexpectedly or fail during trusted-publishing validation, which is a release-blocking regression for this workflow.
Useful? React with 👍 / 👎.
| if token.contains('.') { | ||
| let parts: Vec<&str> = token.split('.').collect(); | ||
| return parts.len() == 3 && parts.iter().all(|p| p.len() >= 4); |
There was a problem hiding this comment.
Redact dotted secrets beyond strict JWT shape
The new dotted-token branch returns true only when parts.len() == 3, so any high-entropy token with dots that is not exactly three segments is forced to false and will be exported unredacted. This still leaks common non-JWT bearer formats (for example opaque OAuth tokens containing a single dot), so diagnostics masking remains incomplete for real authorization headers. Fresh evidence in this patch is the newly added parts.len() == 3 gate.
Useful? React with 👍 / 👎.
Signed-off-by: Michael Neale <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 811dae8d7b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| | b'=' | ||
| | b'&' | ||
| | b'+' |
There was a problem hiding this comment.
Allow base64 secret characters in detector
looks_like_secret rejects any token containing = or +, so long high-entropy base64 secrets (for example Basic auth payloads ending in = or provider keys that include +) are forced to false and remain unredacted in exported diagnostics. Because this masking pass is the last protection before writing logs/session data to the zip, this character denylist creates a direct secret-leak path for common token formats.
Useful? React with 👍 / 👎.
|
do we know where this came from? we shouldn't be logging keys etc in the log files in the first place of course |
|
@DOsinga yeah I have no idea - but I think sometimes people paste in keys into configs (I have seen other agents recommend this TBH) |
Adds a preventative masking pass to the generated diagnostics zip. All text content (config.yaml, server logs, LLM request logs, session JSON) is now scanned before being written to the zip — any long, high-entropy tokens that look like they shouldn't be shared are replaced with
[REDACTED].Detection uses Shannon entropy combined with character-composition heuristics: long random-looking strings made of alphanumeric chars and hyphens get masked, while normal diagnostic values (URLs, model names, paths, descriptions, timestamps) pass through unchanged.
This only affects the diagnostics zip export — no changes to how config is stored or read at runtime.