Skip to content

feat: add template red team dashboard CLI#16

Open
claytonlin1110 wants to merge 1 commit intoAffineFoundation:mainfrom
claytonlin1110:feat/redteam-dashboard
Open

feat: add template red team dashboard CLI#16
claytonlin1110 wants to merge 1 commit intoAffineFoundation:mainfrom
claytonlin1110:feat/redteam-dashboard

Conversation

@claytonlin1110
Copy link
Contributor

Summary

This PR adds a Template Red Team Dashboard as a lightweight CLI (python -m liveweb_arena.redteam) that probes templates without running the browser agent. It executes a deterministic “API semantic probe” by calling each plugin’s fetch_api_data() for a minimal set of inferred URLs, feeding those snapshots through the real GTCollector + template get_ground_truth(). It then emits actionable template-quality metrics and artifacts (report.json, report.md) to support red-team review, anti-memorization checks, and quick regressions in CI.

Motivation

Template quality issues are easy to ship unintentionally:

  • Memorizable templates (collapsed parameter space or small answer space)
  • Semantics drift (question meaning doesn’t match what the API actually returns)
  • Instability (GT changes across close repeats due to volatile sources)
  • Solvability/GT binding issues (GT depends on data that isn’t collected via the intended navigation path)

What’s included

  1. New liveweb_arena.redteam CLI
    Entry point: liveweb_arena/redteam/main.py

Key capabilities:

  • Targeted runs via --templates plugin/template[/variant]
  • Bulk runs via --all-templates (auto-resolves registered templates with a known plugin/cache source)
  • Plugin filtering via --plugins coingecko stooq ...
  • Template discovery without probing via --list-templates

Artifacts:

  • Writes report.json and report.md to ./redteam// (or --output-dir).
  1. Deterministic API probe pipeline (no browser, no LLM)
    Core logic: liveweb_arena/redteam/probe.py

How it works:

  • Generates tasks through the real TaskManager.generate_composite_task(...) for each (seed, template) pair.
  • For each generated question, infers a minimal set of probe URLs (conservative heuristics per plugin where needed).
  • Calls plugin fetch_api_data(url) for each probe URL and feeds results into GTCollector.on_page_visit(...).
  • Calls GTCollector.fetch_remaining_api_gt() which triggers the template’s real get_ground_truth(validation_info) logic.
  • Captures success/failure, GT values, and probe URLs per sample.
  1. Metrics: collapse, baseline, GT success, stability
    Metrics: liveweb_arena/redteam/metrics.py

Computed per template:

  • GT success rate: fraction of samples where GT could be collected from probed data
  • Unique questions / unique GT values: simple diversity indicators
  • Cross-parameter collapse rate: detects whether distinct validation_info configurations collapse to identical GT outputs
  1. CI gating (threshold enforcement)
    Flags (in CLI):

--fail-on-violation (exit code 2 if any violation)
--min-gt-success 0..1
--max-collapse 0..1
--max-baseline 0..1
--min-stability 0..1 (requires --repeat >= 2)

  1. Windows importability fixes (unblocks running tooling on Windows)
    Two issues prevented running python -m liveweb_arena.redteam on Windows:
  • liveweb_arena/init.py eagerly imported the browser layer.
  • liveweb_arena/core/cache.py imported POSIX-only fcntl.

@claytonlin1110
Copy link
Contributor Author

@angosr please check and give me feedback for this

Copy link
Contributor

@angosr angosr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #16 — feat: add template red team dashboard CLI

Significance Gate: CONDITIONAL PASS

A quick automated probe for template GT correctness is useful as a first pass. However, this tool does NOT replace the CLAUDE.md Red Team Review (6 mandatory checks require human judgment: world knowledge attack, memorization space analysis, etc.). The tool should be positioned as a complement, not a substitute.


BLOCKING: 5 generated report files committed to the repository

redteam/20260326_123231/report.md
redteam/20260326_123324/report.md
redteam/20260326_123403/report.md
redteam/20260326_123651/report.md
redteam/20260326_123828/report.md

Generated output must not be committed to source control. Add redteam/ to .gitignore and remove these files from the PR.

BLOCKING: Unrelated changes bundled

  1. liveweb_arena/__init__.py: Lazy-loading BrowserEngine/BrowserSession — identical to the change in PR #12 (rejected). This is an infrastructure change that should be its own PR with proper justification.

  2. liveweb_arena/core/cache.py: Windows fcntl portability fix — unrelated to the red team CLI. Should be a separate fix PR.

These bundled changes make review harder and risk sneaking unrelated modifications through a tooling PR.

CONCERN: _infer_probe_urls is fragile and hard to maintain

The URL inference uses per-plugin heuristics:

if plugin_name == "openlibrary":
    a = vi.get("book_a_query")
    ...
if plugin_name == "arxiv":
    category = vi.get("category")
    ...
if plugin_name == "stooq":
    symbol = vi.get("symbol") or vi.get("symbol_a")
    ...

This creates a parallel code path that must be manually kept in sync with template validation_info keys. When a new template changes key names or adds new URL patterns, this function silently breaks. The probe gives false negatives (GT fails because the URL wasn't inferred) that look like template bugs.

Better approach: let each template declare its probe URLs via a method like get_probe_urls(validation_info) -> List[str], keeping the knowledge co-located with the template code.

CONCERN: Probe bypasses page cache pipeline

The real GT collection uses GTCollector.on_page_visit() which is triggered by the browser's page cache. This probe calls plugin.fetch_api_data(url) directly and feeds it into on_page_visit. But:

  • The probe's api_data timestamp differs from what a real browser visit would produce
  • The probe doesn't respect GT data priority rules (CLAUDE.md §6: detail page > list page)
  • The probe visits URLs in a fixed order, while real agents visit pages in unpredictable order

A probe that passes doesn't guarantee the real eval pipeline works. This should be documented clearly.

Required Actions

  1. Remove the 5 committed report files; add redteam/ to .gitignore
  2. Split out __init__.py lazy-loading and cache.py fcntl fix into separate PRs
  3. Document clearly that this tool is a supplement to (not replacement for) CLAUDE.md Red Team Review and eval.py testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants