threats: bridge to ATR upstream + 7 new privilege_escalation rules#34
threats: bridge to ATR upstream + 7 new privilege_escalation rules#34Adam Lin (eeee2345) wants to merge 7 commits into
Conversation
…MCP poisoning, skill compromise, context exfiltration Contributed under MIT per vaclavbelak's comment on issue gendigitalinc#30 (gendigitalinc#30 (comment)). Upstream: ATR (Agent Threat Rules) — https://github.com/Agent-Threat-Rule/agent-threat-rules Coverage - Prompt injection (4): CLT-PI-001..004 - MCP tool/response attacks (3): CLT-MCP-001..003 - Skill package compromise (8): CLT-SKL-001..008 - Context exfiltration (2): CLT-CTX-001..002 Design - All rules target match_on: content so they fire on Write/Edit content, plugin/skill file scans, and any integration that passes a `content` artifact. They complement Sage's existing 313 rules (command/URL/ credential-file) rather than overlap with them — all rules audited against Sage's existing credential/command/supply-chain rules to avoid duplicates. - Regex converted from ATR's multi-condition YAML to Sage's single- pattern schema; ATR's inline (?i) flags were replaced with case_insensitive: true (Sage's RegExp does not enable inline-flag syntax). - All severities and actions chosen conservatively — log/require_approval where a legitimate use case exists, block where the pattern is attack-only (IMDS URL, Unicode Tag smuggling, time-gated credential read, etc). Validation - Loads cleanly via packages/core loadThreats (17/17 rules). - Zero false positives on the ATR 432-sample real-world benign skill corpus (including apify, browserbase, resend, figma, datadog, axiomhq, antfu/nuxt, datadog-labs, mcp-use, and 420+ others). - 17/17 curated attack test cases trigger the expected rule. - pnpm test: 1521/1521 Sage tests still passing with the file in place. Docs - docs/threat-rules.md "Rule Files" table: add agent-layer.yaml entry. Note on --no-verify: scripts/git-hooks/pre-commit references .gitleaks.toml which does not exist in either the main or pre-release branch, so the hook fails for every contributor. Ran gitleaks directly with default config — no secrets detected. Biome lint clean (14 pre- existing warnings in test files, unrelated to this PR).
… fork impersonation, path traversal, supply chain
Ports 10 additional rule classes from ATR's upstream catalog that the
initial 17-rule subset undercounted. Adds a new supply-chain category to
complement existing prompt-injection / MCP / skill-compromise / context-
exfiltration groupings.
New rules
- CLT-PI-005 System-prompt override framing (new/updated system prompt: …)
- CLT-PI-006 Cross-agent impersonation claim (I am the admin agent …)
- CLT-PI-007 Agent-to-agent override (override verb adjacent to agent keyword)
- CLT-MCP-004 Path traversal to system dir (/etc, /proc, /root, …)
- CLT-MCP-005 Community-fork impersonation prose framing
- CLT-SKL-009 Skill scope hijacking ("also read all other files …")
- CLT-SUP-001 Typosquatted filesystem tool name (filesytem-*, filsystem-*)
- CLT-SUP-002 Install command for "community fork" package
- CLT-CTX-003 PEM private key block appearing in content
- CLT-CTX-004 Obfuscation-framed credential leak (encrypted key: sk-…)
Refinements vs upstream
- CLT-PI-007 tightened: requires an agent-identifier within 80 chars of
the override verb so it does not duplicate CLT-PI-001 on generic user
input.
- CLT-MCP-004 tightened: traversal must terminate in a sensitive system
directory (etc/proc/root/sys/boot/dev/passwd/shadow/hosts). The bare
multi-hop `../../` pattern FPs at ~3% on the benign corpus because
legitimate skills reference relative paths in code examples.
Validation
- loadThreats() loads 27/27 rules cleanly
- 27/27 curated attack test cases trigger the expected rule
- Zero false positives across the 432-sample real-world benign skill
corpus (down from 14 FPs on CLT-MCP-004 before the narrowing above)
- pnpm test: 1521/1521 Sage tests still pass
Why this is a second commit instead of rewriting the earlier one
An initial scope audit dropped a few rule classes as apparent overlaps
with Sage's existing command/URL/credential-file rules. On re-inspection
those were different detection surfaces (content-layer vs command-layer)
so the coverage loss was not intentional. Adding them here as a net-
positive commit keeps the PR history clean for reviewers.
CONTRIBUTING.md requires threats/*.yaml to be licensed under DRL-1.1. @vaclavbelak suggested MIT in issue gendigitalinc#30; relicensing to match the repo's explicit contribution terms and remove the licensing ambiguity before review.
…ayer Add agent-layer threat rules (27 patterns, issue gendigitalinc#30)
Adds a four-file scaffold proposing an opt-in bridge between Sage and the
ATR (Agent Threat Rules) upstream project, plus 7 new agent-layer rules
in the privilege_escalation category as a concrete demonstration of the
bridge's output.
Files added:
- scripts/sync-with-atr.ts Sync script (opt-in, off by default)
- .github/workflows/atr-sync.yml Weekly cron workflow (`if: false`)
- threats/.atr-bridge-config.yaml Bridge config (enabled: false)
- docs/INTEROP.md Interoperability documentation
- threats/agent-layer.yaml +7 rules (CLT-PRV-001a..e, 002, 003)
The bridge is fully opt-in. Both the config and the workflow are disabled
by default; nothing automatically syncs. When enabled, the script writes
to a separate file `threats/agent-layer.atr-generated.yaml` and opens a
DRAFT PR for human review. No PR is ever auto-merged.
The 7 new rules cover:
- CLT-PRV-001a..e Microsoft Semantic Kernel SessionsPythonPlugin
CVE-2026-25592 (MSRC disclosure 2026-05-07). Five sister
rules covering autostart paths, SK identifiers,
descriptor patterns, file-write call sites, and Windows
registry Run-key persistence.
- CLT-PRV-002 eval() / new Function / vm.runIn dynamic code injection.
- CLT-PRV-003 Shell metacharacter injection in tool arguments.
All 7 rules upstream from MIT-licensed ATR rules; per-rule provenance
comment preserves attribution under both MIT and DRL 1.1.
All 34 rules (27 existing + 7 new) parse via js-yaml and all regex
compile under the JavaScript RegExp engine (same engine Sage uses at
runtime via threat-loader.ts).
See docs/INTEROP.md for what the bridge handles, what it doesn't, and how
Sage maintainers stay in editorial control. The bridge converter itself
lives in the agent-threat-rules npm package; this PR adds Sage-side
plumbing only.
68d7b03 to
9b5e8e8
Compare
|
Noting the PR shows mergeable_state: blocked in the GitHub API, but there are no reported CI checks and no branch protection rule visible from the outside. Is there a specific CI pipeline or review gate I should know about, or is this just waiting on a maintainer review pass? Happy to address anything that needs changes before this can merge. |
|
Hi Adam Lin (@eeee2345), thanks for the work here, really appreciated! The ATR bridge concept is solid, and several of the new rules are genuinely useful. To answer your question first, the CI check was broken, but should be fixed now, apart from the CI tests/checks, only a review is needed. We have three blockers before this can merge. Blocker 1: The GH Actions workflow needs to be removed .github/workflows/atr-sync.yml grants the agent-threat-rules npm package (and peter-evans/create-pull-request@v6, pinned by tag not SHA) code execution with contents: write and pull-requests: write on this repo. If either package is compromised, the draft PR review gate is bypassed entirely. For a security tool whose threat definitions are themselves an attack surface, that's not an acceptable risk. The if: false gate doesn't help, it's one line in a file we've already merged. The sync script (sync-with-atr.ts) is fine, running it locally and opening a PR manually is a workflow we can support. Please drop the workflow YAML from this PR. Blocker 2: No tests Every threat category in this repo has a dedicated *-threats.test.ts file with explicit match cases (inputs that should fire) and benign non-match cases (inputs that should not). The seven new rules in this PR have none. This is a hard requirement, we won't merge rules we can't verify won't regress under future refactors, and writing the benign cases is also the fastest way to discover false-positive problems before they ship. Blocker 3: Three of the seven new rules need rework CLT-PRV-001a/b/d/e look good, specific, CVE-anchored, block at 0.9 confidence is justified. The other three have FP problems that would be immediately obvious from benign test cases:
ATR's 432-sample benign corpus might not be calibrated against developer workflows, Sage's user population writes code with an AI agent, which presumably looks very different from that corpus. We've had to remove some rules from the previous contribution for the same reason. For these three: either tighten the pattern to anchor it to a specific attack context, or drop them from this PR and revisit when the patterns are ready. In both cases, tests are required. The four CVE-specific rules plus the script, config, and docs can merge once the workflow is dropped and tests are added for them. Thanks again for your contribution, some of the rules you previously contributed are genuinely useful and will likely ship in the next release, this is just to make sure we don't break anything for our users. Let me know pls if I can be of any help. Best wishes, Vaclav |
…e 4 CVE rules Per @vaclavbelak's review (gendigitalinc#34): - B1: remove .github/workflows/atr-sync.yml — it gave the agent-threat-rules npm package + create-pull-request action write access; supply-chain risk. The sync script stays; run it locally and open a PR manually. Also updated the 3 docs (sync script header, bridge config, INTEROP) that referenced it. - B3: drop CLT-PRV-001c (path-vocabulary), 002 (eval/Function), 003 (shell metachars) — FP-prone on developer workflows. Keep the 4 CVE-anchored rules 001a/b/d/e (SK SessionsPythonPlugin file-write + Run-key persistence). - B2: add agent-layer-threats.test.ts with match + benign cases for all 4 kept rules (9 cases, verified against the compiled patterns).
|
Thanks Vaclav — sharp review, and the FP point on the dev corpus is fair. All three addressed: B1 — dropped the workflow. Agreed: handing the npm package and the PR-creation action write access is the wrong trade for a security repo. The sync script stays as a local opt-in; I'll run it and open PRs by hand. Also cleaned up the three docs that referenced the removed workflow. B3 — dropped CLT-PRV-001c (path vocabulary), 002 (eval/Function), and 003 (shell metacharacters). They fire on ordinary developer work, and the 432-sample benign corpus doesn't represent users who write code with an agent. Kept the four CVE-anchored rules (001a/b/d/e): the SK SessionsPythonPlugin file-write into autostart paths and the Run-key persistence, each anchored to a specific call site or path. B2 — added agent-layer-threats.test.ts with match and benign cases for all four kept rules. The benign cases cover the exact FP shapes you flagged: ordinary build paths, normal fs.writeFile targets, session-management vocabulary, and the words register/regex. So this PR is now the four CVE rules plus the script, config, and docs. Happy to revisit the three dropped rules later with tighter, context-anchored patterns. Thanks for shepherding this. |
Proposes an opt-in bridge to the ATR (Agent Threat Rules) upstream project, plus 7 new agent-layer rules in privilege_escalation as a concrete demonstration. Follow-up to PR #33.
What this PR adds
scripts/sync-with-atr.ts — sync script that converts ATR rules to Sage's schema and writes them to a separate file
threats/agent-layer.atr-generated.yaml(never touchesagent-layer.yamldirectly). Opt-in: gated byenabled: truein the config..github/workflows/atr-sync.yml — weekly cron workflow that runs the sync and opens a DRAFT PR. Disabled by default (
if: false). When enabled, the workflow opens DRAFT PRs only; no PR is ever auto-merged.threats/.atr-bridge-config.yaml — config controlling the bridge. Lists ATR rule ids to include, ids to exclude, manual overrides, and per-category id offset (so generated ids start after Sage's existing rules in each category).
docs/INTEROP.md — explains the bridge architecture, what it handles vs what it skips, license handling, and how Sage maintainers stay in editorial control.
threats/agent-layer.yaml — 7 new rules in privilege_escalation:
\$(...)with dangerous binaries, and|-chained interpreters.How Sage maintainers stay in control
The bridge is fully opt-in. Two safety gates:
threats/.atr-bridge-config.yamlships withenabled: false. The sync script refuses to run until a maintainer changes this..github/workflows/atr-sync.ymlhasif: falseon the job. The workflow won't run on schedule until a maintainer flips this.Even when both are enabled, the script writes to a separate file
threats/agent-layer.atr-generated.yaml. Sage maintainers manually copy desired rules into the productionagent-layer.yaml. The workflow opens DRAFT pull requests only.Specific rules can be excluded via
exclude_ids: [...]in the config. Specific rules can be markedmanual_overrides:so a maintainer-edited version inagent-layer.yamlis never overwritten by future sync runs.Validation
All 34 rules (27 existing from PR #33 + 7 new in this PR) parse via js-yaml and every regex compiles under the JavaScript RegExp engine that
packages/core/src/threat-loader.tsuses at runtime.Conversion validated against ATR's 432-sample benign corpus (0 FP on the source rules; the bridge preserves the patterns verbatim). I have not yet validated against Sage's 1521 test corpus — Sage maintainers should run that locally before merging.
License
Bridge code: MIT (consistent with ATR's repo license).
Rule content in
threats/agent-layer.yaml: Detection Rule License 1.1 (Sage threats/ convention).Upstream MIT attribution preserved per-rule in the
# Upstream: ATR-2026-NNNNN (MIT) — <url>comment.How the bridge handles ATR's multi-condition rules
ATR rules typically have 5-10 detection conditions per rule. The converter:
contentchannel.(?:r1)|(?:r2)|...when the combined regex stays under 500 chars.(?i)inline flag into Sage's rule-levelcase_insensitive: true.action: blocktorequire_approvalwhen ATR confidence is below 0.85, matching Sage's existing convention.ATR semantic-tier rules, behavioral rules, and deprecated rules are skipped with warnings; they don't appear in the output.
Reference
ATR project: https://github.com/Agent-Threat-Rule/agent-threat-rules
Bridge converter:
agent-threat-rules/converters/sage(npm subpath in agent-threat-rules@2.1.3+)