Skip to content

feat(flue): port skill-drift system to Flue framework (side-by-side)#127

Draft
HazAT wants to merge 24 commits into
mainfrom
flue/skill-drift-port
Draft

feat(flue): port skill-drift system to Flue framework (side-by-side)#127
HazAT wants to merge 24 commits into
mainfrom
flue/skill-drift-port

Conversation

@HazAT
Copy link
Copy Markdown
Member

@HazAT HazAT commented May 12, 2026

Caution

This PR still depends on the @mistralai/mistralai supply-chain mitigation work; keep the PR in Draft until the advisory requirement is fully satisfied.

Summary

Ports Flue skill-drift from a centralized weekly scheduler in this repo to an inverted architecture: each SDK repo runs its own per-PR detector trigger and invokes a shared reusable workflow in sentry-for-ai.

What's in this PR

  • Reusable Detector workflow in this repo: .github/workflows/flue-skill-drift-detector-reusable.yml
  • 19 SDK-repo wrapper templates in docs/agent-port/sdk-repo-wrappers/
  • Updater + Creator as local CLI tools only:
    • ./scripts/test-flue-updater.sh
    • ./scripts/test-flue-creator.sh
  • Full Flue project scaffold and supply-chain mitigation (@mistralai/mistralai) remaining intact

Architecture

┌───────────────────────────────────────────────────────────────┐
│ Per-SDK-repo workflow (e.g. getsentry/sentry-android)         │
│ on: pull_request: types: [closed]                             │
│ if: pull_request.merged == true                               │
│ uses: getsentry/sentry-for-ai/.github/workflows/                │
│   flue-skill-drift-detector-reusable.yml@main                 │
└───────────────────┬───────────────────────────────────────────┘
                    │ workflow_call with skill_name, sdk_repo,
                    │ pr_number, pr_url
                    ▼
┌───────────────────────────────────────────────────────────────┐
│ Reusable workflow in getsentry/sentry-for-ai                  │
│ • detect job: checkout SDK + skills repo, run Flue agent,     │
│   output JSON actions array                                    │
│ • actuate job: apply patches, open PRs/issues in              │
│   getsentry/sentry-for-ai via GitHub App token                │
└───────────────────────────────────────────────────────────────┘
                    │ (skill-drift labeled PR opens)
                    ▼
┌───────────────────────────────────────────────────────────────┐
│ skill-drift-assign-reviewers.yml (unchanged)                  │
│ Routes the PR to the right SDK team based on changed paths    │
└───────────────────────────────────────────────────────────────┘

Separately (local-only, no CI trigger):
┌───────────────────────────────────────────────────────────────┐
│ Updater & Skill Creator                                        │
│ Invoked via ./scripts/test-flue-updater.sh and                │
│ ./scripts/test-flue-creator.sh                                │
│ Edits files locally; human reviews and opens PR manually       │
└───────────────────────────────────────────────────────────────┘

How to test locally

  • Updater: ./scripts/test-flue-updater.sh [--issue <N>|--fixture]
  • Creator: ./scripts/test-flue-creator.sh <platform> [prompt]
  • Detector:
    • ./scripts/test-flue-detector.sh <skill_name> <sdk_repo> <pr_number> [sdk_repo_path]
    • or run a normal SDK PR merge when wrappers are onboarded

Pending follow-ups

  • Complete per-repo detector rollout by adding the wrapper workflow to the 19 SDK repos and validating output.
  • Remove old gh-aw-specific docs and assets only after all per-repo detectors are stable.

Review findings

Type File Notes
commit 3c0d201 removed Updater + Creator GH-Actions workflows; kept tools local only
commit e47071b redesigned Detector for single-PR, single-skill invocation
commit 27367df converted Detector to reusable workflow + added per-repo wrappers

HazAT added 9 commits May 12, 2026 11:19
Scaffolds the Node.js/Flue project structure at repo root as the first
step of porting the skill-drift agents from GitHub Agentic Workflows to
Flue. See .pi/plans/2026-05-12-flue-skill-drift/plan.md for the full plan.

Dependencies (pinned to exact versions):
  @flue/sdk   0.5.3
  @flue/cli   0.5.3
  valibot     1.4.0

Files created:
  package.json       — type:module, Node 22 engine, pinned deps
  tsconfig.json      — ES2022 target, NodeNext module resolution
  flue.config.ts     — defineConfig({ target: 'node' })
  .flue/agents/      — placeholder for agent handlers (T02-T04)
  .flue/roles/       — placeholder for role markdown (T02-T04)
  .gitignore         — added node_modules/ and dist/

Verification:
  - npm ci: clean install, 361 packages, no errors
  - npx tsc --noEmit: passes (flue.config.ts compiles cleanly)
  - npx flue --help: CLI responds correctly

.agents/skills/ auto-discovery check:
  Flue discovers .agents/skills/ at runtime only when agent code
  explicitly calls session.skill(). The 30 SDK skills in this repo
  are NOT auto-injected into agent context — they are only loaded
  on-demand by name. The disable-model-invocation frontmatter flag
  is irrelevant to Flue's runtime (Flue does not parse it). No
  mitigation needed in T02-T04; the skills won't interfere with
  the skill-drift agents unless explicitly invoked.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Ports the SDK Skill Drift Detector agent from the existing gh-aw workflow
(.github/workflows/skill-drift-check.md) to the Flue harness.

- .flue/roles/detector.md — carries the full ported prompt verbatim:
  the SDK-to-repo-to-team mapping table, Steps 1-5 (gather PRs, filter,
  compare, decide, return), and the decision rules for create_pr vs
  create_issue vs skip. Instructs the agent to use the `gh` CLI for
  GitHub access (no MCP) and to never run git write commands — patches
  are computed as unified diffs and returned in the `patch` field.

- .flue/agents/skill-drift-detector.ts — thin handler (~50 lines).
  Accepts an optional `{ since?: string }` payload for overriding the
  7-day window. Initialises a local sandbox session with
  `anthropic/claude-opus-4-6`, delegates to the detector role, and
  returns Valibot-typed JSON: `{ actions: Action[], summary: string }`
  where Action is `create_pr | create_issue | skip`.

The output schema is the contract for T05 (actuator). The handler itself
does no GitHub writes — it only computes and returns the action list.

This runs side-by-side with the existing gh-aw detector; no changes to
.github/workflows/skill-drift-check.md or its lock file.

Plan: .pi/plans/2026-05-12-flue-skill-drift/plan.md

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Mirror of T02 pattern: role carries the full prompt, handler is a thin
orchestration shim (52 lines).

- `.flue/roles/updater.md` — ported from `.github/agents/skill-updater.agent.md`
  with the 8-step drift-fix flow, 5-file knowledge-base loading instruction,
  targeted-updates guardrail, and verification block
  (`./scripts/build-skill-tree.sh --check`)
- `.flue/agents/skill-drift-updater.ts` — accepts issue payload, runs with
  `anthropic/claude-opus-4-6`, returns structured UpdaterOutput metadata only
- Output schema is the contract for T06 (actuator): skill, summary,
  files_changed, sdk_pr_references, optional skipped
- Knowledge base loaded at runtime via the agent's read tool — not inlined
- No git operations in handler or role; actuator handles commit/push/PR
- Runs side-by-side with the existing Copilot custom-agent (gh-aw path)

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Mirrors the T02/T03 pattern (detector + updater):
- .flue/agents/skill-creator.ts — workflow_dispatch handler with
  platform + prompt inputs; validates output with valibot CreatorOutput schema
- .flue/roles/creator.md — 6-phase creator workflow ported from
  .github/agents/skill-creator.agent.md

Output schema returns metadata only (files_created, files_modified,
router_updated, skill, platform, summary, skipped). No git operations in
the handler or role — the actuator (T07) handles commit/push/PR.

Key behaviours in the role:
- Existence check first (skips if SDK or skill already exists)
- Loads 5 knowledge-base files at the start of every run
- Requires updating the sentry-sdk-setup router table before validation
- ./scripts/build-skill-tree.sh --check must pass; failure sets skipped
- SDK-to-repo mapping table carried over from the Copilot agent

Parallel run: .github/agents/skill-creator.agent.md stays untouched.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Adds .github/workflows/flue-skill-drift-detector.yml — a two-job
GitHub Actions workflow that runs the Flue skill-drift-detector agent
and applies its output actions (create_pr / create_issue / skip).

Key design points:
- Cron trigger: Monday 22:42 UTC (42 22 * * 1), matching gh-aw cadence,
  plus workflow_dispatch with optional `since` date input
- Two-job split: `detect` is read-only (contents: read) and runs the
  agent; `actuate` has write permissions (contents, pull-requests,
  issues) to apply the results
- Protected-files enforcement: if a proposed patch touches package.json,
  lockfiles, tsconfig.json, flue.config.ts, AGENTS.md, CLAUDE.md,
  SKILL_TREE.md, scripts/build-skill-tree.sh, .github/, .agents/, or
  .flue/, the actuator downgrades the action from PR to issue
- Patch apply uses `git apply --check` first; failures are counted and
  logged without aborting the run
- Runs side-by-side with existing gh-aw detector (different name +
  concurrency group: flue-skill-drift-detector)
- Does not touch skill-drift-check.md/.lock.yml or
  skill-drift-assign-reviewers.yml (which already handles label-based
  reviewer assignment for any PR with skill-drift label)

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Adds .github/workflows/flue-skill-drift-updater.yml — the GitHub Actions
workflow that processes skill-drift issues opened by the Detector.

Key design points:
- Triggers on issues.labeled/opened with 'skill-drift' label, replacing the
  gh-aw 'assignees: [copilot]' mechanism; also supports workflow_dispatch
- Two-job split: read-only 'update' job runs the Flue agent and captures a
  git patch artifact; write-permissioned 'actuate' job applies it and opens a PR
- Patch-based artifact handoff (git diff --cached > changes.patch) between jobs,
  mirroring T05's detector pattern
- Protected-files gate in actuator blocks commits to lock files, config, scripts,
  and .github/** — same regex as flue-skill-drift-detector.yml
- Skill-tree validator: regenerates SKILL_TREE.md then runs --check; bails with
  an issue comment on real validation failures
- Commit message includes 'Closes #N' for auto-close on PR merge
- Skipped agent results post a comment on the originating issue
- Concurrency keyed on issue number so parallel issues don't race

Runs side-by-side with the existing Copilot custom-agent workflow.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
workflow_dispatch trigger (manual-only) with platform + prompt inputs.
Two-job split mirroring Detector/Updater:
- `create` job (read-only, 90min timeout): runs the skill-creator Flue
  agent, captures result.json + changes.patch as artifacts
- `actuate` job (write, 15min): downloads artifacts, applies patch,
  runs protected-files gate and skill-tree validator, commits and opens PR

Key design decisions:
- Concurrency group keyed on platform to prevent parallel runs for the
  same platform
- Protected-files violations open an issue instead of silently failing
- Skill-tree validator failure also opens an issue with stderr output
- `skill-drift` label applied so the reviewer-assign workflow fires
- PR title uses feat(<scope>) (no [skill-drift] prefix — creator action)
- Branch sanitizes platform input (lowercase, non-alnum -> dash)

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Three interactive shell scripts for local development testing of the three Flue agents:

- scripts/test-flue-detector.sh — runs skill-drift-detector with a configurable 'since' window
- scripts/test-flue-updater.sh — runs skill-drift-updater from a fixture file or a real GH issue (--issue N)
- scripts/test-flue-creator.sh — runs skill-creator with a platform arg and optional prompt

Each script:
- Checks ANTHROPIC_API_KEY and GH_TOKEN/GITHUB_TOKEN before proceeding
- Prominently warns about API costs (Detector: $0.20-$1.00, Creator: $2-$10)
- Prompts for confirmation before invoking the live model
- Saves output to /tmp/flue-<agent>-result.json and pretty-prints via jq
- Validates the result against the expected output schema (PASS/FAIL)

Also adds scripts/fixtures/flue-updater-issue.json — a realistic but clearly fake drift issue (issue #9999, PR #99999) for offline testing of the Updater.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Verified that skill-drift-assign-reviewers.yml fires correctly on PRs
opened by all Flue agents (Detector, Updater, Creator):

- Trigger: on.pull_request.types[opened] + paths[skills/sentry-*-sdk/**] matches
  all Flue-opened PRs since they all modify skill files.
- Label filter: all three Flue workflows apply --label "skill-drift" to their
  PRs (Creator uses it at line 263 of flue-skill-creator.yml).
- SKILL_TEAMS map: covers all 19 current skills in skills/sentry-*-sdk/ — 100%
  match, no gaps for existing platforms.
- No-op path: script logs and exits cleanly when no matching skill dir found
  (safe for brand-new platforms created by Flue Creator).
- Permissions: pull-requests:write is sufficient for requestReviewers.

No code changes needed. Added a top-of-file comment documenting the
source-agnostic behavior and the brand-new-platform no-op case.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Comment thread .github/workflows/flue-skill-creator.yml Outdated
Comment thread .github/workflows/flue-skill-drift-detector.yml Outdated
HazAT and others added 5 commits May 12, 2026 17:36
Adds a short Flue Subproject section describing the file layout
(.flue/agents, .flue/roles, package.json, etc.) and how to run the
agents locally with npx flue run. Placed before the Skill Tree
Navigation section.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
…cfcf-g5fm)

Advisory: GHSA-3q49-cfcf-g5fm (Critical, all versions, no patched version exists)

The dep chain @flue/sdk@0.5.3 → @mariozechner/pi-ai@0.73.1 → @mistralai/mistralai@2.2.1
pulls in a malicious package. All versions of @mistralai/mistralai are flagged with no
upstream fix available.

The package has no install hooks so npm ci itself is safe — the risk is dormant and only
triggered if pi-ai's lazy import('./mistral.js') fires (i.e., if a Mistral model is invoked).
Our three agents all hardcode anthropic/claude-opus-4-6, so Mistral never loads under current
code paths. This mitigation eliminates the latent risk entirely.

A postinstall script now physically removes the @mistralai directory from node_modules after
every npm install or npm ci. Both the specific package dir and the @mistralai scope dir are
removed in case other packages from that scope are pulled in later.

Note: npm audit will continue to report this advisory because it reads the lockfile, not disk.
This is expected and documents the upstream issue. The fix is a workaround pending upstream
Flue/pi-ai dropping the dependency — track at:
- https://github.com/mariozechner/pi-ai/issues (drop @mistralai/mistralai dep)
- https://github.com/badlogic/flue/issues (upgrade to patched pi-ai once available)

Co-Authored-By: claude-sonnet-4-5 <claude-sonnet-4-5@anthropic.com>
All three role files (.flue/roles/detector.md, updater.md, creator.md) contained
the literal substring 'MCP' in constraints that prohibit using MCP servers. While
the intent was correct, the presence of the substring caused the plan's grep
contract to fail.

Rewrote each constraint positively, replacing 'Do NOT use any MCP server or
external GitHub integration' with 'Use the gh CLI for all GitHub access. Do not
connect to external services for GitHub operations.' The semantic meaning is
preserved — agents are still instructed to use gh CLI only.

The grep contract (grep -ri "mcp" .flue/roles/ returns zero matches) is now
satisfied.

Addresses P0 #3 from the review at .pi/plans/2026-05-12-flue-skill-drift/review.md.

Co-Authored-By: claude-sonnet-4-5 <claude-sonnet-4-5@anthropic.com>
The original schemas marked `skipped` as an optional field while requiring
all success fields (skill, summary, files_changed, etc.). This meant Valibot
rejected legitimate skip-only responses because the required fields were absent.

Changes:
- Replaced UpdaterOutput flat object with v.union([UpdaterSuccess, UpdaterSkipped])
  discriminated by status: 'success' | 'skipped' literals
- Replaced CreatorOutput flat object with v.union([CreatorSuccess, CreatorSkipped])
  with the same discriminant pattern
- Updated flue-skill-drift-updater.yml: skip detection now checks .status == 'skipped'
  and reads .reason; actuate job if: clause branches on .status == 'success'
- Updated flue-skill-creator.yml: removed the skipped step output (multiline hazard);
  now emits status= only; actuate if: clause uses needs.create.outputs.status == 'success'
- Lightly reworded Output sections in updater.md and creator.md to describe the new
  discriminated-union shape instead of the optional skipped field

Fixes P0 #2 from the review at .pi/plans/2026-05-12-flue-skill-drift/review.md.

Co-Authored-By: Claude claude-sonnet-4-5 via pi worker agent
The happy-path `gh pr create` call had `--label` flags placed after the
heredoc EOF terminator. In bash, anything after the heredoc terminator is
parsed as a separate command, so the flags were silently dropped — the PR
opened with no labels, which meant the reviewer-assign workflow never fired.

Fixed by writing the PR body to a temp file (/tmp/pr-body.md) with a
standalone `cat > ... <<EOF` block, then calling `gh pr create` as a
normal argument-style command with all flags on the same logical line.

Also removed three references to the non-existent `skill-creator` label:
- one in the happy-path `gh pr create` call
- one in the protected-files violation `gh issue create` call
- one in the skill-tree validation failure `gh issue create` call

These would have caused `gh` to exit non-zero when the workflows ran.
Replaced the two issue-create labels with `skill-drift` (already in use
by the other Flue workflows) to keep labelling consistent.

Addresses the P1 finding in .pi/plans/2026-05-12-flue-skill-drift/review.md.

Co-Authored-By: Claude (claude-sonnet-4-5 via Pi worker agent)
@HazAT HazAT force-pushed the flue/skill-drift-port branch from a009515 to 4ec438c Compare May 12, 2026 15:36
@HazAT HazAT marked this pull request as ready for review May 12, 2026 15:45
Comment thread .github/workflows/flue-skill-drift-updater.yml Outdated
Comment thread .github/workflows/flue-skill-drift-detector.yml Outdated
Comment thread .github/workflows/flue-skill-drift-updater.yml Outdated
Comment thread .flue/roles/creator.md Outdated
Comment thread .flue/roles/detector.md Outdated
HazAT added 3 commits May 12, 2026 18:03
…c edge cases

- Move inputs.* interpolations from bash run: blocks into env: blocks across all three
  workflows — Creator uses PLATFORM/PROMPT env vars, Updater uses GH_EVENT_NAME/
  ISSUE_NUMBER_INPUT/ISSUE_NUMBER_EVENT/ISSUE_NUMBER. This is the canonical fix for
  GitHub Actions script-injection sinks (Warden FGH-435): template substitution now
  happens at the env layer, not inside bash, so shell metacharacters in user-supplied
  input are never executed.

- Bump Updater's update job from issues: read to issues: write so the 'Post skip
  comment if agent skipped' step can call gh issue comment without a 403. The
  actuate job's issues: write was already correct and is unchanged. Flagged by
  Seer Bug Prediction and Cursor code review on PR #127.

- Replace fixed EOF heredoc delimiters with random 32-hex delimiters via
  openssl rand -hex 16 when writing multi-line JSON to $GITHUB_OUTPUT across all
  three workflows. A bare EOF line in LLM-generated output (e.g. inside a summary
  field) would otherwise truncate the heredoc early and corrupt fromJSON() parsing
  in downstream jobs. Flagged by Seer on PR #127.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
…n BRT-8PC)

Per Warden security review BRT-8PC: deny-lists are structurally weak for
LLM-emitted patches — any path the agent emits that isn't explicitly listed
slips through. The old regex also missed common sensitive paths: .husky/,
.npmrc, Dockerfile, .env*, renovate.json, .changeset/, .devcontainer/,
top-level *.sh, commitlint.config.*, vitest.config.*, eslint.config.*.

New allow-list: only paths matching ^skills/ are accepted; everything else
triggers the existing downgrade-to-issue path. This captures the invariant
that agents are only supposed to edit skill files — protecting current paths,
future paths, and paths nobody thought to enumerate.

Also strips leading ./ prefix defensively before the pattern match, eliminating
any doubt about ^ anchor bypass via ./prefixed paths.

Updated docs/agent-port/04-flue-implementation.md §6 to describe the new
allow-list approach and §12.5 noting the source of the change (PR #127 Warden
review BRT-8PC).

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
…g protected files

- Added `sentry-cloudflare-sdk` and `sentry-elixir-sdk` rows to mapping tables in all
  three role files (Cursor bot flagged 17/19 — `SKILL_TEAMS` has 19 entries). Cloudflare
  lives in the JavaScript monorepo (packages/cloudflare/, packages/core/); Elixir is its
  own repo (getsentry/sentry-elixir), no path filter.

- Removed Creator role's instructions to run `build-skill-tree.sh` and modify
  `AGENTS.md`/`SKILL_TREE.md`. The workflow's actuator regenerates `SKILL_TREE.md`
  after the allowlist check, so the agent's regeneration was both redundant and harmful:
  any successful Creator run would be downgraded to an issue because its patch touched
  a protected file outside `skills/`. Updated the output schema example to omit
  `SKILL_TREE.md` from `files_modified`.

- Replaced the Phase 5 `build-skill-tree.sh --check` verification in creator.md with
  a safe `grep` sanity-check against the router table. Updater's `--check` reference
  (read-only mode) is left intact. Detector has no `build-skill-tree` reference.

- Net effect: Creator can now successfully open PRs end-to-end; Detector/Updater can
  identify drift on Cloudflare and Elixir SDKs.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Comment on lines +189 to +192
if ! ./scripts/build-skill-tree.sh --check 2>/tmp/skill-tree-err; then
echo "::error::Skill tree validation failed after regeneration"
ERR=$(cat /tmp/skill-tree-err)
gh issue comment "$ISSUE_NUMBER" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The workflow captures stderr from build-skill-tree.sh, but the script writes its errors to stdout, resulting in empty error reports on validation failure.
Severity: MEDIUM

Suggested Fix

Modify the command in the workflow to capture both stdout and stderr. Change ./scripts/build-skill-tree.sh --check 2>/tmp/skill-tree-err to ./scripts/build-skill-tree.sh --check &> /tmp/skill-tree-err. This will redirect both streams to the file, ensuring the $ERR variable contains the actual error messages.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: .github/workflows/flue-skill-drift-updater.yml#L189-L192

Potential issue: The `build-skill-tree.sh` script writes its validation error messages
to `stdout`. However, the calling GitHub workflow only captures `stderr` by using the
redirection `2>/tmp/skill-tree-err`. When the script fails due to validation errors, the
captured `$ERR` variable is empty. Consequently, the GitHub issue comment created to
report the failure contains an empty code block, providing no actionable information for
developers to debug why the skill tree validation failed.

Also affects:

  • .github/workflows/flue-skill-creator.yml:188~191


# Protected files check
local touched
touched=$(git diff --name-only)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The allowlist check uses git diff --name-only, which does not detect new untracked files. This could allow a patch to create and commit files outside the allowed directory.
Severity: MEDIUM

Suggested Fix

Replace git diff --name-only with a command that can detect all changes, including untracked files. Use git status --porcelain and parse its output to get a list of all modified, staged, and untracked files to ensure the allowlist check is comprehensive.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: .github/workflows/flue-skill-drift-detector.yml#L123

Potential issue: The workflow's security check uses `git diff --name-only` to identify
modified files and verify they are within an allowed directory. However, this command
does not list new, untracked files. A patch applied via `git apply` can create new
files. If a patch creates a new, untracked file outside the allowed `skills/` directory,
it will bypass this security check. The subsequent `git add -A` command will then stage
and commit this untracked file, potentially introducing unauthorized code.

Comment thread .github/workflows/flue-skill-creator.yml Outdated
HazAT added 4 commits May 20, 2026 10:59
…LI tools

Removed the Updater and Skill Creator GitHub Action workflow files because these agents are invoked locally by humans via smoke scripts and manual PR flow.

.flue TypeScript handlers and role markdowns remain unchanged and continue to be the authoritative implementations. Invocation now goes through ./scripts/test-flue-updater.sh and ./scripts/test-flue-creator.sh, while CI will only keep detector-driven automation handled separately.

This commit only addresses Solo todo #360. Re-architecture of detector workflow behavior is deferred to follow-up todos #361 and #362.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi),
- Updated Detector input payload to: { skill_name, sdk_repo, pr_number, pr_url, sdk_repo_path }.
- Removed per-action skill field from DetectorOutput and simplified output contracts to full-run single-skill scope.
- Rewrote detector role for one merged PR flow: removed 19-row mapping table, 7-day date-window logic, and monorepo path-filter framing.
- Kept duplicate-check guidance tied to open skill-drift PRs/issues in getsentry/sentry-for-ai and lowered action caps to 5 create_pr / 5 create_issue.
- Updated flue detector smoke test to accept <skill_name> <sdk_repo> <pr_number> [+sdk_repo_path], added --fixture mode, and added new fixture at scripts/fixtures/flue-detector-pr.json.
- This is U02 of the Detector single-PR rearchitecture; U03 (reusable workflow and repo wrappers) lands next.
- Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Removed standalone flue-skill-drift-detector.yml (cron-driven, centralized).

Added reusable workflow flue-skill-drift-detector-reusable.yml with workflow_call inputs for skill+SDK PR context.

Used GitHub App token for cross-repo write operations in getsentry/sentry-for-ai; no GITHUB_TOKEN writes.

Added 19 example caller wrappers under docs/agent-port/sdk-repo-wrappers/ and onboarding README.

Wrappers use PR closed/merged trigger on target SDK repos with noise filtering.

Per-PR flow references PR metadata in generated PR/issue titles and bodies.

Reference Solo todo #362.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
…r inverted architecture

Rewrite 04-flue-implementation.md architecture sections for the inverted Flue flow (per-SDK wrapper -> reusable workflow), including diagram, mapping table, file layout, detector schema, local run guidance, cutover plan, risks, open questions, and new SDK repo onboarding section.

Update AGENTS.md Flue subproject section to describe the new architecture: reusable detector workflow in this repo, local-only Updater/Creator CLI invocation, and onboarding via 19 wrapper templates under docs/agent-port/sdk-repo-wrappers/.

Update PR #127 body via gh pr edit 127 to mirror the inverted architecture and local-first Updater/Creator path.

Closes #363.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
@HazAT HazAT marked this pull request as draft May 20, 2026 09:09
Comment thread scripts/test-flue-detector.sh Outdated
Comment on lines +47 to +53
PAYLOAD=$(jq -c \
--arg skill_name "$SKILL_NAME" \
--arg sdk_repo "$SDK_REPO" \
--argjson pr_number "$PR_NUMBER" \
--arg pr_url "https://github.com/${SDK_REPO}/pull/${PR_NUMBER}" \
--arg sdk_repo_path "$SDK_REPO_PATH" \
'{skill_name:$skill_name,sdk_repo:$sdk_repo,pr_number:$pr_number,pr_url:$pr_url,sdk_repo_path:$sdk_repo_path}')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The jq command in test-flue-detector.sh is missing the -n flag, causing the script to hang indefinitely when run without the --fixture option.
Severity: HIGH

Suggested Fix

Add the -n flag to the jq command on line 47 to prevent it from reading from standard input. Change jq -c \ to jq -c -n \.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: scripts/test-flue-detector.sh#L47-L53

Potential issue: The `scripts/test-flue-detector.sh` script will hang indefinitely when
invoked in its primary, documented, non-fixture mode. The `jq` command on line 47 is
called without the `-n` flag and without any piped input or input file. This causes `jq`
to wait for input from stdin, which is never provided within the `$(...)` command
substitution. As a result, the script execution stalls and never completes.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ec41145. Configure here.

run: |
set -euo pipefail

RESULT="artifact/result.json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actuate job reads artifact from wrong path

High Severity

The download-artifact step (no working-directory) places result.json at $GITHUB_WORKSPACE/artifact/result.json. The "Apply actions" step runs with working-directory: skills-repo and sets RESULT="artifact/result.json", resolving to $GITHUB_WORKSPACE/skills-repo/artifact/result.json — a path that never exists. Under set -euo pipefail, jq will fail immediately and no actions are ever applied.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ec41145. Configure here.

git switch main
git branch -D "$branch_full"
local issue_body
issue_body="The Detector proposed a change to a protected path (\`$violation\`) for skill \`$SKILL_NAME\`.\n\nOriginal title: $title\n\nOriginal body:\n\n$body\n\nTouched paths:\n\n\`\`\`\n$touched\n\`\`\`\n\nDetected during merge of [${SDK_REPO}#${PR_NUMBER}](${PR_URL})."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downgrade issue body contains literal \n instead of newlines

Low Severity

The issue_body variable for the protected-path downgrade uses \n inside regular double quotes. Bash doesn't interpret \n as newlines in "..." strings — they stay literal. Use $'...\n...' quoting or printf (as add_reference_footer already does) to produce actual line breaks in the created GitHub issue.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ec41145. Configure here.

- '**/__tests__/**'
paths:
- "packages/browser/**"
- "packages/core/**"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaScript wrappers use mutually exclusive path filters

High Severity

All 8 sentry-javascript-*.yml wrapper templates specify both paths-ignore and paths on the same pull_request event. GitHub Actions rejects this combination — these filters are mutually exclusive per event. Workflows copied from these templates will fail validation and never trigger. Use a single paths list with ! negation patterns instead (e.g., !**/*.md).

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ec41145. Configure here.

Comment thread .flue/agents/skill-drift-detector.ts Outdated
HazAT added 3 commits May 21, 2026 15:24
…ppers, untracked allowlist, prompt-injection surface)

Bug A: actuator now reads detector output from "${GITHUB_WORKSPACE}/artifact/result.json" to match download-artifact output location under workspace root.\nBug B: local smoke script adds `jq -c -n` for non-fixture payload construction to avoid stdin blocking.\nBug C: all sentry-javascript wrappers now use a single `paths:` list with GitHub Actions negation patterns (no `paths-ignore`) to satisfy workflow constraints.\nBug D: allow-list check now evaluates `git diff --name-only HEAD` plus untracked files from `git ls-files --others --exclude-standard` to prevent missing staged new files.\nBug E: dropped the `sdk_repo_path` input/payload path and removed local SDK checkout from the detector flow, reducing PR-controlled-path prompt-injection surface.\n\nCo-Authored-By: Claude (claude-opus-4-6 via Pi)
Adds on: workflow_dispatch alongside workflow_call with the same detector inputs for pre-production manual runs.\nMoves app-token creation out of detect so manual dispatch can run with only ANTHROPIC_API_KEY.\nSkips actuate on workflow_dispatch (only detect runs + result artifact), and adds visible result summarization for manual inspection.\nIncludes no behavior change for production workflow_call path, which still performs actuator-based PR/issue creation.\nReference: getstarted with pilot in getsentry/sentry-go#1308; this test hook is for manual validation before App secrets are fully in place.\n\nCo-Authored-By: Claude (claude-sonnet-4-6 via Pi)
The Flue CLI emits build progress messages ('[flue] Building:',
'[flue] Source root:', etc.) to stdout BEFORE the agent's JSON result,
not to stderr as the docs imply. The smoke script and the reusable
workflow both captured raw stdout to result.json and then ran jq on
it, which failed with 'parse error: Invalid literal at line 1, column 6'.

Fix: capture the raw output to flue-output.log, then extract the
trailing JSON object (everything from the first line starting with
'{') into result.json via 'sed -n /^{/,$p'. The raw log is now
uploaded as a separate artifact alongside result.json so we can debug
build issues post-run.

Verified locally: smoke run against getsentry/sentry-go#1302 now parses
cleanly. The agent correctly emitted a single 'skip' action recognising
that the removed ContextifyFrames integration was never user-facing
and so doesn't create drift in the sentry-go-sdk skill.

Co-Authored-By: Claude (claude-sonnet-4-6 via Pi)
Comment thread package.json
"node": ">=22"
},
"scripts": {
"postinstall": "rm -rf node_modules/@mistralai/mistralai && rm -rf node_modules/@mistralai && echo 'Removed @mistralai/mistralai per GHSA-3q49-cfcf-g5fm (malware advisory)'"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known-malware package executes before postinstall cleanup in CI, exposing GITHUB_TOKEN

The postinstall hook removing @mistralai/mistralai (GHSA-3q49-cfcf-g5fm) runs only after npm has already installed all packages and executed their own lifecycle scripts, so any malicious install hooks in that package fire first — during npm ci in the reusable workflow where GITHUB_TOKEN is available in the runner environment.

Evidence
  • package-lock.json confirms @mistralai/mistralai@2.2.1 as a resolved transitive dependency via node_modules/@mariozechner/pi-ai (line 1378 lists "@mistralai/mistralai": "^2.2.0" under its deps; package-lock line 1394 resolves it to 2.2.1).
  • npm's lifecycle execution order guarantees all dependency install scripts run before the root package's postinstall, so the malware can execute before rm -rf node_modules/@mistralai/mistralai removes it.
  • The reusable workflow flue-skill-drift-detector-reusable.yml runs npm ci (Install Flue step, detect job) without suppressing install scripts; GITHUB_TOKEN is available as an environment variable to all steps in that job, including during package installation.
  • The PR description explicitly acknowledges the mitigation is incomplete: "This PR still depends on the @mistralai/mistralai supply-chain mitigation work; keep the PR in Draft until the advisory requirement is fully satisfied."
  • The postinstall script does nothing to prevent execution of @mistralai/mistralai's own lifecycle hooks and provides a false sense of mitigation.

Identified by Warden security-review · 238-9D5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant