Skip to content

threats: bridge to ATR upstream + 7 new privilege_escalation rules#34

Open
Adam Lin (eeee2345) wants to merge 7 commits into
gendigitalinc:pre-releasefrom
eeee2345:feat/atr-bridge
Open

threats: bridge to ATR upstream + 7 new privilege_escalation rules#34
Adam Lin (eeee2345) wants to merge 7 commits into
gendigitalinc:pre-releasefrom
eeee2345:feat/atr-bridge

Conversation

@eeee2345

Copy link
Copy Markdown

Proposes an opt-in bridge to the ATR (Agent Threat Rules) upstream project, plus 7 new agent-layer rules in privilege_escalation as a concrete demonstration. Follow-up to PR #33.

What this PR adds

scripts/sync-with-atr.ts — sync script that converts ATR rules to Sage's schema and writes them to a separate file threats/agent-layer.atr-generated.yaml (never touches agent-layer.yaml directly). Opt-in: gated by enabled: true in the config.

.github/workflows/atr-sync.yml — weekly cron workflow that runs the sync and opens a DRAFT PR. Disabled by default (if: false). When enabled, the workflow opens DRAFT PRs only; no PR is ever auto-merged.

threats/.atr-bridge-config.yaml — config controlling the bridge. Lists ATR rule ids to include, ids to exclude, manual overrides, and per-category id offset (so generated ids start after Sage's existing rules in each category).

docs/INTEROP.md — explains the bridge architecture, what it handles vs what it skips, license handling, and how Sage maintainers stay in editorial control.

threats/agent-layer.yaml — 7 new rules in privilege_escalation:

  • CLT-PRV-001a..e: Microsoft Semantic Kernel SessionsPythonPlugin CVE-2026-25592. MSRC disclosed 2026-05-07; five sister rules covering autostart paths (Windows Startup, /etc/cron, /etc/systemd, /Library/LaunchAgents), SK-specific tool identifiers, descriptor patterns advertising arbitrary file-write, file-write call sites, and Windows registry Run-key persistence.
  • CLT-PRV-002: eval() / new Function / vm.runIn dynamic code injection with untrusted-input context.
  • CLT-PRV-003: Shell metacharacter injection in tool arguments — pipe-to-shell, \$(...) with dangerous binaries, and |-chained interpreters.

How Sage maintainers stay in control

The bridge is fully opt-in. Two safety gates:

  1. threats/.atr-bridge-config.yaml ships with enabled: false. The sync script refuses to run until a maintainer changes this.
  2. .github/workflows/atr-sync.yml has if: false on the job. The workflow won't run on schedule until a maintainer flips this.

Even when both are enabled, the script writes to a separate file threats/agent-layer.atr-generated.yaml. Sage maintainers manually copy desired rules into the production agent-layer.yaml. The workflow opens DRAFT pull requests only.

Specific rules can be excluded via exclude_ids: [...] in the config. Specific rules can be marked manual_overrides: so a maintainer-edited version in agent-layer.yaml is never overwritten by future sync runs.

Validation

All 34 rules (27 existing from PR #33 + 7 new in this PR) parse via js-yaml and every regex compiles under the JavaScript RegExp engine that packages/core/src/threat-loader.ts uses at runtime.

Conversion validated against ATR's 432-sample benign corpus (0 FP on the source rules; the bridge preserves the patterns verbatim). I have not yet validated against Sage's 1521 test corpus — Sage maintainers should run that locally before merging.

License

Bridge code: MIT (consistent with ATR's repo license).
Rule content in threats/agent-layer.yaml: Detection Rule License 1.1 (Sage threats/ convention).
Upstream MIT attribution preserved per-rule in the # Upstream: ATR-2026-NNNNN (MIT) — <url> comment.

How the bridge handles ATR's multi-condition rules

ATR rules typically have 5-10 detection conditions per rule. The converter:

  • Collapses ATR's text-channel fields (user_input, agent_output, tool_response, tool_args, tool_name, tool_description, content) to Sage's single content channel.
  • Combines same-channel conditions via non-capturing regex alternation (?:r1)|(?:r2)|... when the combined regex stays under 500 chars.
  • Splits into N sister Sage rules (CLT-XXX-001a, -001b, ...) when the combined regex would exceed 500 chars, so each rule's pattern stays readable and debuggable. CLT-PRV-001a..e in this PR is the result of this split applied to the SK CVE rule's 5 detection conditions.
  • Extracts (?i) inline flag into Sage's rule-level case_insensitive: true.
  • Downgrades action: block to require_approval when ATR confidence is below 0.85, matching Sage's existing convention.

ATR semantic-tier rules, behavioral rules, and deprecated rules are skipped with warnings; they don't appear in the output.

Reference

ATR project: https://github.com/Agent-Threat-Rule/agent-threat-rules
Bridge converter: agent-threat-rules/converters/sage (npm subpath in agent-threat-rules@2.1.3+)

attlab0527-lab and others added 6 commits April 19, 2026 07:29
…MCP poisoning, skill compromise, context exfiltration

Contributed under MIT per vaclavbelak's comment on issue gendigitalinc#30
(gendigitalinc#30 (comment)).

Upstream: ATR (Agent Threat Rules) — https://github.com/Agent-Threat-Rule/agent-threat-rules

Coverage
- Prompt injection (4):      CLT-PI-001..004
- MCP tool/response attacks (3): CLT-MCP-001..003
- Skill package compromise (8): CLT-SKL-001..008
- Context exfiltration (2):  CLT-CTX-001..002

Design
- All rules target match_on: content so they fire on Write/Edit content,
  plugin/skill file scans, and any integration that passes a `content`
  artifact. They complement Sage's existing 313 rules (command/URL/
  credential-file) rather than overlap with them — all rules audited
  against Sage's existing credential/command/supply-chain rules to
  avoid duplicates.
- Regex converted from ATR's multi-condition YAML to Sage's single-
  pattern schema; ATR's inline (?i) flags were replaced with
  case_insensitive: true (Sage's RegExp does not enable inline-flag
  syntax).
- All severities and actions chosen conservatively — log/require_approval
  where a legitimate use case exists, block where the pattern is
  attack-only (IMDS URL, Unicode Tag smuggling, time-gated credential
  read, etc).

Validation
- Loads cleanly via packages/core loadThreats (17/17 rules).
- Zero false positives on the ATR 432-sample real-world benign skill
  corpus (including apify, browserbase, resend, figma, datadog,
  axiomhq, antfu/nuxt, datadog-labs, mcp-use, and 420+ others).
- 17/17 curated attack test cases trigger the expected rule.
- pnpm test: 1521/1521 Sage tests still passing with the file in place.

Docs
- docs/threat-rules.md "Rule Files" table: add agent-layer.yaml entry.

Note on --no-verify: scripts/git-hooks/pre-commit references
.gitleaks.toml which does not exist in either the main or pre-release
branch, so the hook fails for every contributor. Ran gitleaks directly
with default config — no secrets detected. Biome lint clean (14 pre-
existing warnings in test files, unrelated to this PR).
… fork impersonation, path traversal, supply chain

Ports 10 additional rule classes from ATR's upstream catalog that the
initial 17-rule subset undercounted. Adds a new supply-chain category to
complement existing prompt-injection / MCP / skill-compromise / context-
exfiltration groupings.

New rules
- CLT-PI-005  System-prompt override framing (new/updated system prompt: …)
- CLT-PI-006  Cross-agent impersonation claim (I am the admin agent …)
- CLT-PI-007  Agent-to-agent override (override verb adjacent to agent keyword)
- CLT-MCP-004 Path traversal to system dir (/etc, /proc, /root, …)
- CLT-MCP-005 Community-fork impersonation prose framing
- CLT-SKL-009 Skill scope hijacking ("also read all other files …")
- CLT-SUP-001 Typosquatted filesystem tool name (filesytem-*, filsystem-*)
- CLT-SUP-002 Install command for "community fork" package
- CLT-CTX-003 PEM private key block appearing in content
- CLT-CTX-004 Obfuscation-framed credential leak (encrypted key: sk-…)

Refinements vs upstream
- CLT-PI-007 tightened: requires an agent-identifier within 80 chars of
  the override verb so it does not duplicate CLT-PI-001 on generic user
  input.
- CLT-MCP-004 tightened: traversal must terminate in a sensitive system
  directory (etc/proc/root/sys/boot/dev/passwd/shadow/hosts). The bare
  multi-hop `../../` pattern FPs at ~3% on the benign corpus because
  legitimate skills reference relative paths in code examples.

Validation
- loadThreats() loads 27/27 rules cleanly
- 27/27 curated attack test cases trigger the expected rule
- Zero false positives across the 432-sample real-world benign skill
  corpus (down from 14 FPs on CLT-MCP-004 before the narrowing above)
- pnpm test: 1521/1521 Sage tests still pass

Why this is a second commit instead of rewriting the earlier one
An initial scope audit dropped a few rule classes as apparent overlaps
with Sage's existing command/URL/credential-file rules. On re-inspection
those were different detection surfaces (content-layer vs command-layer)
so the coverage loss was not intentional. Adding them here as a net-
positive commit keeps the PR history clean for reviewers.
CONTRIBUTING.md requires threats/*.yaml to be licensed under DRL-1.1.
@vaclavbelak suggested MIT in issue gendigitalinc#30; relicensing to match the repo's
explicit contribution terms and remove the licensing ambiguity before
review.
…ayer

Add agent-layer threat rules (27 patterns, issue gendigitalinc#30)
Adds a four-file scaffold proposing an opt-in bridge between Sage and the
ATR (Agent Threat Rules) upstream project, plus 7 new agent-layer rules
in the privilege_escalation category as a concrete demonstration of the
bridge's output.

Files added:
- scripts/sync-with-atr.ts                Sync script (opt-in, off by default)
- .github/workflows/atr-sync.yml          Weekly cron workflow (`if: false`)
- threats/.atr-bridge-config.yaml         Bridge config (enabled: false)
- docs/INTEROP.md                         Interoperability documentation
- threats/agent-layer.yaml                +7 rules (CLT-PRV-001a..e, 002, 003)

The bridge is fully opt-in. Both the config and the workflow are disabled
by default; nothing automatically syncs. When enabled, the script writes
to a separate file `threats/agent-layer.atr-generated.yaml` and opens a
DRAFT PR for human review. No PR is ever auto-merged.

The 7 new rules cover:
- CLT-PRV-001a..e  Microsoft Semantic Kernel SessionsPythonPlugin
                   CVE-2026-25592 (MSRC disclosure 2026-05-07). Five sister
                   rules covering autostart paths, SK identifiers,
                   descriptor patterns, file-write call sites, and Windows
                   registry Run-key persistence.
- CLT-PRV-002      eval() / new Function / vm.runIn dynamic code injection.
- CLT-PRV-003      Shell metacharacter injection in tool arguments.

All 7 rules upstream from MIT-licensed ATR rules; per-rule provenance
comment preserves attribution under both MIT and DRL 1.1.

All 34 rules (27 existing + 7 new) parse via js-yaml and all regex
compile under the JavaScript RegExp engine (same engine Sage uses at
runtime via threat-loader.ts).

See docs/INTEROP.md for what the bridge handles, what it doesn't, and how
Sage maintainers stay in editorial control. The bridge converter itself
lives in the agent-threat-rules npm package; this PR adds Sage-side
plumbing only.
@eeee2345

Copy link
Copy Markdown
Author

Noting the PR shows mergeable_state: blocked in the GitHub API, but there are no reported CI checks and no branch protection rule visible from the outside. Is there a specific CI pipeline or review gate I should know about, or is this just waiting on a maintainer review pass? Happy to address anything that needs changes before this can merge.

@vaclavbelak

Copy link
Copy Markdown
Collaborator

Hi Adam Lin (@eeee2345), thanks for the work here, really appreciated! The ATR bridge concept is solid, and several of the new rules are genuinely useful. To answer your question first, the CI check was broken, but should be fixed now, apart from the CI tests/checks, only a review is needed. We have three blockers before this can merge.

Blocker 1: The GH Actions workflow needs to be removed

.github/workflows/atr-sync.yml grants the agent-threat-rules npm package (and peter-evans/create-pull-request@v6, pinned by tag not SHA) code execution with contents: write and pull-requests: write on this repo. If either package is compromised, the draft PR review gate is bypassed entirely. For a security tool whose threat definitions are themselves an attack surface, that's not an acceptable risk. The if: false gate doesn't help, it's one line in a file we've already merged.

The sync script (sync-with-atr.ts) is fine, running it locally and opening a PR manually is a workflow we can support. Please drop the workflow YAML from this PR.

Blocker 2: No tests

Every threat category in this repo has a dedicated *-threats.test.ts file with explicit match cases (inputs that should fire) and benign non-match cases (inputs that should not). The seven new rules in this PR have none. This is a hard requirement, we won't merge rules we can't verify won't regress under future refactors, and writing the benign cases is also the fastest way to discover false-positive problems before they ship.

Blocker 3: Three of the seven new rules need rework

CLT-PRV-001a/b/d/e look good, specific, CVE-anchored, block at 0.9 confidence is justified. The other three have FP problems that would be immediately obvious from benign test cases:

  • CLT-PRV-001c (matches "arbitrary"/"user-supplied"/"unvalidated" path language): this is documentation and error-handling vocabulary. Any codebase that handles user input paths triggers it, including Sage's own.
  • CLT-PRV-002 (eval(), Function(), vm.runInNewContext() on content): these appear in React, build tools, test fixtures, and template engines. Will generate require_approval noise on legitimate daily work for any JS/TS developer.
  • CLT-PRV-003 (shell metacharacters on content): writing any shell script fires this. The 0.65 confidence suggests ATR itself wasn't confident here either.

ATR's 432-sample benign corpus might not be calibrated against developer workflows, Sage's user population writes code with an AI agent, which presumably looks very different from that corpus. We've had to remove some rules from the previous contribution for the same reason.

For these three: either tighten the pattern to anchor it to a specific attack context, or drop them from this PR and revisit when the patterns are ready. In both cases, tests are required.

The four CVE-specific rules plus the script, config, and docs can merge once the workflow is dropped and tests are added for them.

Thanks again for your contribution, some of the rules you previously contributed are genuinely useful and will likely ship in the next release, this is just to make sure we don't break anything for our users. Let me know pls if I can be of any help.

Best wishes, Vaclav

…e 4 CVE rules

Per @vaclavbelak's review (gendigitalinc#34):
- B1: remove .github/workflows/atr-sync.yml — it gave the agent-threat-rules
  npm package + create-pull-request action write access; supply-chain risk.
  The sync script stays; run it locally and open a PR manually. Also updated
  the 3 docs (sync script header, bridge config, INTEROP) that referenced it.
- B3: drop CLT-PRV-001c (path-vocabulary), 002 (eval/Function), 003 (shell
  metachars) — FP-prone on developer workflows. Keep the 4 CVE-anchored
  rules 001a/b/d/e (SK SessionsPythonPlugin file-write + Run-key persistence).
- B2: add agent-layer-threats.test.ts with match + benign cases for all 4
  kept rules (9 cases, verified against the compiled patterns).
@eeee2345

Copy link
Copy Markdown
Author

Thanks Vaclav — sharp review, and the FP point on the dev corpus is fair. All three addressed:

B1 — dropped the workflow. Agreed: handing the npm package and the PR-creation action write access is the wrong trade for a security repo. The sync script stays as a local opt-in; I'll run it and open PRs by hand. Also cleaned up the three docs that referenced the removed workflow.

B3 — dropped CLT-PRV-001c (path vocabulary), 002 (eval/Function), and 003 (shell metacharacters). They fire on ordinary developer work, and the 432-sample benign corpus doesn't represent users who write code with an agent. Kept the four CVE-anchored rules (001a/b/d/e): the SK SessionsPythonPlugin file-write into autostart paths and the Run-key persistence, each anchored to a specific call site or path.

B2 — added agent-layer-threats.test.ts with match and benign cases for all four kept rules. The benign cases cover the exact FP shapes you flagged: ordinary build paths, normal fs.writeFile targets, session-management vocabulary, and the words register/regex.

So this PR is now the four CVE rules plus the script, config, and docs. Happy to revisit the three dropped rules later with tighter, context-anchored patterns. Thanks for shepherding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants