threats: bridge to ATR upstream + 7 new privilege_escalation rules by eeee2345 · Pull Request #34 · gendigitalinc/sage

Adam Lin (eeee2345) · 2026-05-11T17:46:48Z

Proposes an opt-in bridge to the ATR (Agent Threat Rules) upstream project, plus 7 new agent-layer rules in privilege_escalation as a concrete demonstration. Follow-up to PR #33.

What this PR adds

scripts/sync-with-atr.ts — sync script that converts ATR rules to Sage's schema and writes them to a separate file threats/agent-layer.atr-generated.yaml (never touches agent-layer.yaml directly). Opt-in: gated by enabled: true in the config.

.github/workflows/atr-sync.yml — weekly cron workflow that runs the sync and opens a DRAFT PR. Disabled by default (if: false). When enabled, the workflow opens DRAFT PRs only; no PR is ever auto-merged.

threats/.atr-bridge-config.yaml — config controlling the bridge. Lists ATR rule ids to include, ids to exclude, manual overrides, and per-category id offset (so generated ids start after Sage's existing rules in each category).

docs/INTEROP.md — explains the bridge architecture, what it handles vs what it skips, license handling, and how Sage maintainers stay in editorial control.

threats/agent-layer.yaml — 7 new rules in privilege_escalation:

CLT-PRV-001a..e: Microsoft Semantic Kernel SessionsPythonPlugin CVE-2026-25592. MSRC disclosed 2026-05-07; five sister rules covering autostart paths (Windows Startup, /etc/cron, /etc/systemd, /Library/LaunchAgents), SK-specific tool identifiers, descriptor patterns advertising arbitrary file-write, file-write call sites, and Windows registry Run-key persistence.
CLT-PRV-002: eval() / new Function / vm.runIn dynamic code injection with untrusted-input context.
CLT-PRV-003: Shell metacharacter injection in tool arguments — pipe-to-shell, \$(...) with dangerous binaries, and |-chained interpreters.

How Sage maintainers stay in control

The bridge is fully opt-in. Two safety gates:

threats/.atr-bridge-config.yaml ships with enabled: false. The sync script refuses to run until a maintainer changes this.
.github/workflows/atr-sync.yml has if: false on the job. The workflow won't run on schedule until a maintainer flips this.

Even when both are enabled, the script writes to a separate file threats/agent-layer.atr-generated.yaml. Sage maintainers manually copy desired rules into the production agent-layer.yaml. The workflow opens DRAFT pull requests only.

Specific rules can be excluded via exclude_ids: [...] in the config. Specific rules can be marked manual_overrides: so a maintainer-edited version in agent-layer.yaml is never overwritten by future sync runs.

Validation

All 34 rules (27 existing from PR #33 + 7 new in this PR) parse via js-yaml and every regex compiles under the JavaScript RegExp engine that packages/core/src/threat-loader.ts uses at runtime.

Conversion validated against ATR's 432-sample benign corpus (0 FP on the source rules; the bridge preserves the patterns verbatim). I have not yet validated against Sage's 1521 test corpus — Sage maintainers should run that locally before merging.

License

Bridge code: MIT (consistent with ATR's repo license).
Rule content in threats/agent-layer.yaml: Detection Rule License 1.1 (Sage threats/ convention).
Upstream MIT attribution preserved per-rule in the # Upstream: ATR-2026-NNNNN (MIT) — <url> comment.

How the bridge handles ATR's multi-condition rules

ATR rules typically have 5-10 detection conditions per rule. The converter:

Collapses ATR's text-channel fields (user_input, agent_output, tool_response, tool_args, tool_name, tool_description, content) to Sage's single content channel.
Combines same-channel conditions via non-capturing regex alternation (?:r1)|(?:r2)|... when the combined regex stays under 500 chars.
Splits into N sister Sage rules (CLT-XXX-001a, -001b, ...) when the combined regex would exceed 500 chars, so each rule's pattern stays readable and debuggable. CLT-PRV-001a..e in this PR is the result of this split applied to the SK CVE rule's 5 detection conditions.
Extracts (?i) inline flag into Sage's rule-level case_insensitive: true.
Downgrades action: block to require_approval when ATR confidence is below 0.85, matching Sage's existing convention.

ATR semantic-tier rules, behavioral rules, and deprecated rules are skipped with warnings; they don't appear in the output.

Reference

ATR project: https://github.com/Agent-Threat-Rule/agent-threat-rules
Bridge converter: agent-threat-rules/converters/sage (npm subpath in agent-threat-rules@2.1.3+)

…MCP poisoning, skill compromise, context exfiltration Contributed under MIT per vaclavbelak's comment on issue gendigitalinc#30 (gendigitalinc#30 (comment)). Upstream: ATR (Agent Threat Rules) — https://github.com/Agent-Threat-Rule/agent-threat-rules Coverage - Prompt injection (4): CLT-PI-001..004 - MCP tool/response attacks (3): CLT-MCP-001..003 - Skill package compromise (8): CLT-SKL-001..008 - Context exfiltration (2): CLT-CTX-001..002 Design - All rules target match_on: content so they fire on Write/Edit content, plugin/skill file scans, and any integration that passes a `content` artifact. They complement Sage's existing 313 rules (command/URL/ credential-file) rather than overlap with them — all rules audited against Sage's existing credential/command/supply-chain rules to avoid duplicates. - Regex converted from ATR's multi-condition YAML to Sage's single- pattern schema; ATR's inline (?i) flags were replaced with case_insensitive: true (Sage's RegExp does not enable inline-flag syntax). - All severities and actions chosen conservatively — log/require_approval where a legitimate use case exists, block where the pattern is attack-only (IMDS URL, Unicode Tag smuggling, time-gated credential read, etc). Validation - Loads cleanly via packages/core loadThreats (17/17 rules). - Zero false positives on the ATR 432-sample real-world benign skill corpus (including apify, browserbase, resend, figma, datadog, axiomhq, antfu/nuxt, datadog-labs, mcp-use, and 420+ others). - 17/17 curated attack test cases trigger the expected rule. - pnpm test: 1521/1521 Sage tests still passing with the file in place. Docs - docs/threat-rules.md "Rule Files" table: add agent-layer.yaml entry. Note on --no-verify: scripts/git-hooks/pre-commit references .gitleaks.toml which does not exist in either the main or pre-release branch, so the hook fails for every contributor. Ran gitleaks directly with default config — no secrets detected. Biome lint clean (14 pre- existing warnings in test files, unrelated to this PR).

… fork impersonation, path traversal, supply chain Ports 10 additional rule classes from ATR's upstream catalog that the initial 17-rule subset undercounted. Adds a new supply-chain category to complement existing prompt-injection / MCP / skill-compromise / context- exfiltration groupings. New rules - CLT-PI-005 System-prompt override framing (new/updated system prompt: …) - CLT-PI-006 Cross-agent impersonation claim (I am the admin agent …) - CLT-PI-007 Agent-to-agent override (override verb adjacent to agent keyword) - CLT-MCP-004 Path traversal to system dir (/etc, /proc, /root, …) - CLT-MCP-005 Community-fork impersonation prose framing - CLT-SKL-009 Skill scope hijacking ("also read all other files …") - CLT-SUP-001 Typosquatted filesystem tool name (filesytem-*, filsystem-*) - CLT-SUP-002 Install command for "community fork" package - CLT-CTX-003 PEM private key block appearing in content - CLT-CTX-004 Obfuscation-framed credential leak (encrypted key: sk-…) Refinements vs upstream - CLT-PI-007 tightened: requires an agent-identifier within 80 chars of the override verb so it does not duplicate CLT-PI-001 on generic user input. - CLT-MCP-004 tightened: traversal must terminate in a sensitive system directory (etc/proc/root/sys/boot/dev/passwd/shadow/hosts). The bare multi-hop `../../` pattern FPs at ~3% on the benign corpus because legitimate skills reference relative paths in code examples. Validation - loadThreats() loads 27/27 rules cleanly - 27/27 curated attack test cases trigger the expected rule - Zero false positives across the 432-sample real-world benign skill corpus (down from 14 FPs on CLT-MCP-004 before the narrowing above) - pnpm test: 1521/1521 Sage tests still pass Why this is a second commit instead of rewriting the earlier one An initial scope audit dropped a few rule classes as apparent overlaps with Sage's existing command/URL/credential-file rules. On re-inspection those were different detection surfaces (content-layer vs command-layer) so the coverage loss was not intentional. Adding them here as a net- positive commit keeps the PR history clean for reviewers.

@vaclavbelak

CONTRIBUTING.md requires threats/*.yaml to be licensed under DRL-1.1. @vaclavbelak suggested MIT in issue gendigitalinc#30; relicensing to match the repo's explicit contribution terms and remove the licensing ambiguity before review.

…ayer Add agent-layer threat rules (27 patterns, issue gendigitalinc#30)

Adds a four-file scaffold proposing an opt-in bridge between Sage and the ATR (Agent Threat Rules) upstream project, plus 7 new agent-layer rules in the privilege_escalation category as a concrete demonstration of the bridge's output. Files added: - scripts/sync-with-atr.ts Sync script (opt-in, off by default) - .github/workflows/atr-sync.yml Weekly cron workflow (`if: false`) - threats/.atr-bridge-config.yaml Bridge config (enabled: false) - docs/INTEROP.md Interoperability documentation - threats/agent-layer.yaml +7 rules (CLT-PRV-001a..e, 002, 003) The bridge is fully opt-in. Both the config and the workflow are disabled by default; nothing automatically syncs. When enabled, the script writes to a separate file `threats/agent-layer.atr-generated.yaml` and opens a DRAFT PR for human review. No PR is ever auto-merged. The 7 new rules cover: - CLT-PRV-001a..e Microsoft Semantic Kernel SessionsPythonPlugin CVE-2026-25592 (MSRC disclosure 2026-05-07). Five sister rules covering autostart paths, SK identifiers, descriptor patterns, file-write call sites, and Windows registry Run-key persistence. - CLT-PRV-002 eval() / new Function / vm.runIn dynamic code injection. - CLT-PRV-003 Shell metacharacter injection in tool arguments. All 7 rules upstream from MIT-licensed ATR rules; per-rule provenance comment preserves attribution under both MIT and DRL 1.1. All 34 rules (27 existing + 7 new) parse via js-yaml and all regex compile under the JavaScript RegExp engine (same engine Sage uses at runtime via threat-loader.ts). See docs/INTEROP.md for what the bridge handles, what it doesn't, and how Sage maintainers stay in editorial control. The bridge converter itself lives in the agent-threat-rules npm package; this PR adds Sage-side plumbing only.

Adam Lin (eeee2345) · 2026-05-26T18:56:04Z

Noting the PR shows mergeable_state: blocked in the GitHub API, but there are no reported CI checks and no branch protection rule visible from the outside. Is there a specific CI pipeline or review gate I should know about, or is this just waiting on a maintainer review pass? Happy to address anything that needs changes before this can merge.

Vaclav Belak (vaclavbelak) · 2026-06-11T13:40:25Z

Hi Adam Lin (@eeee2345), thanks for the work here, really appreciated! The ATR bridge concept is solid, and several of the new rules are genuinely useful. To answer your question first, the CI check was broken, but should be fixed now, apart from the CI tests/checks, only a review is needed. We have three blockers before this can merge.

Blocker 1: The GH Actions workflow needs to be removed

.github/workflows/atr-sync.yml grants the agent-threat-rules npm package (and peter-evans/create-pull-request@v6, pinned by tag not SHA) code execution with contents: write and pull-requests: write on this repo. If either package is compromised, the draft PR review gate is bypassed entirely. For a security tool whose threat definitions are themselves an attack surface, that's not an acceptable risk. The if: false gate doesn't help, it's one line in a file we've already merged.

The sync script (sync-with-atr.ts) is fine, running it locally and opening a PR manually is a workflow we can support. Please drop the workflow YAML from this PR.

Blocker 2: No tests

Every threat category in this repo has a dedicated *-threats.test.ts file with explicit match cases (inputs that should fire) and benign non-match cases (inputs that should not). The seven new rules in this PR have none. This is a hard requirement, we won't merge rules we can't verify won't regress under future refactors, and writing the benign cases is also the fastest way to discover false-positive problems before they ship.

Blocker 3: Three of the seven new rules need rework

CLT-PRV-001a/b/d/e look good, specific, CVE-anchored, block at 0.9 confidence is justified. The other three have FP problems that would be immediately obvious from benign test cases:

CLT-PRV-001c (matches "arbitrary"/"user-supplied"/"unvalidated" path language): this is documentation and error-handling vocabulary. Any codebase that handles user input paths triggers it, including Sage's own.
CLT-PRV-002 (eval(), Function(), vm.runInNewContext() on content): these appear in React, build tools, test fixtures, and template engines. Will generate require_approval noise on legitimate daily work for any JS/TS developer.
CLT-PRV-003 (shell metacharacters on content): writing any shell script fires this. The 0.65 confidence suggests ATR itself wasn't confident here either.

ATR's 432-sample benign corpus might not be calibrated against developer workflows, Sage's user population writes code with an AI agent, which presumably looks very different from that corpus. We've had to remove some rules from the previous contribution for the same reason.

For these three: either tighten the pattern to anchor it to a specific attack context, or drop them from this PR and revisit when the patterns are ready. In both cases, tests are required.

The four CVE-specific rules plus the script, config, and docs can merge once the workflow is dropped and tests are added for them.

Thanks again for your contribution, some of the rules you previously contributed are genuinely useful and will likely ship in the next release, this is just to make sure we don't break anything for our users. Let me know pls if I can be of any help.

Best wishes, Vaclav

@vaclavbelak

…e 4 CVE rules Per @vaclavbelak's review (gendigitalinc#34): - B1: remove .github/workflows/atr-sync.yml — it gave the agent-threat-rules npm package + create-pull-request action write access; supply-chain risk. The sync script stays; run it locally and open a PR manually. Also updated the 3 docs (sync script header, bridge config, INTEROP) that referenced it. - B3: drop CLT-PRV-001c (path-vocabulary), 002 (eval/Function), 003 (shell metachars) — FP-prone on developer workflows. Keep the 4 CVE-anchored rules 001a/b/d/e (SK SessionsPythonPlugin file-write + Run-key persistence). - B2: add agent-layer-threats.test.ts with match + benign cases for all 4 kept rules (9 cases, verified against the compiled patterns).

Adam Lin (eeee2345) · 2026-06-11T18:42:47Z

Thanks Vaclav — sharp review, and the FP point on the dev corpus is fair. All three addressed:

B1 — dropped the workflow. Agreed: handing the npm package and the PR-creation action write access is the wrong trade for a security repo. The sync script stays as a local opt-in; I'll run it and open PRs by hand. Also cleaned up the three docs that referenced the removed workflow.

B3 — dropped CLT-PRV-001c (path vocabulary), 002 (eval/Function), and 003 (shell metacharacters). They fire on ordinary developer work, and the 432-sample benign corpus doesn't represent users who write code with an agent. Kept the four CVE-anchored rules (001a/b/d/e): the SK SessionsPythonPlugin file-write into autostart paths and the Run-key persistence, each anchored to a specific call site or path.

B2 — added agent-layer-threats.test.ts with match and benign cases for all four kept rules. The benign cases cover the exact FP shapes you flagged: ordinary build paths, normal fs.writeFile targets, session-management vocabulary, and the words register/regex.

So this PR is now the four CVE rules plus the script, config, and docs. Happy to revisit the three dropped rules later with tighter, context-anchored patterns. Thanks for shepherding this.

attlab0527-lab and others added 6 commits April 19, 2026 07:29

chore: add changeset for agent-layer rules

2f283a8

Merge pull request gendigitalinc#33 from eeee2345/contrib/atr-agent-l…

dbca236

…ayer Add agent-layer threat rules (27 patterns, issue gendigitalinc#30)

Vaclav Belak (vaclavbelak) force-pushed the pre-release branch from 68d7b03 to 9b5e8e8 Compare May 26, 2026 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threats: bridge to ATR upstream + 7 new privilege_escalation rules#34

threats: bridge to ATR upstream + 7 new privilege_escalation rules#34
Adam Lin (eeee2345) wants to merge 7 commits into
gendigitalinc:pre-releasefrom
eeee2345:feat/atr-bridge

Adam Lin (eeee2345) commented May 11, 2026

Uh oh!

Adam Lin (eeee2345) commented May 26, 2026

Uh oh!

Vaclav Belak (vaclavbelak) commented Jun 11, 2026

Uh oh!

Adam Lin (eeee2345) commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Adam Lin (eeee2345) commented May 11, 2026

What this PR adds

How Sage maintainers stay in control

Validation

License

How the bridge handles ATR's multi-condition rules

Reference

Uh oh!

Adam Lin (eeee2345) commented May 26, 2026

Uh oh!

Vaclav Belak (vaclavbelak) commented Jun 11, 2026

Uh oh!

Adam Lin (eeee2345) commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants