Add e2e workflow for [QE] story test automation#28
Conversation
Add a new e2e workflow that takes [QE] Jira stories through an 8-phase pipeline (ingest, plan, revise, code, validate, publish, respond) to produce end-to-end tests. The workflow discovers the target project's test infrastructure at runtime — framework, harness, auxiliary services, conventions, and reference suites — rather than hardcoding assumptions. Key e2e-specific adaptations from the implement workflow: - Scenario-driven planning (ACs map to Describe/Context/It blocks) - Deep infrastructure discovery (10 sub-steps covering harness, services, utilities, conventions, reference suites) - Anti-pattern detection during validation (hardcoded sleeps, brittle selectors, missing cleanup, harness bypass, etc.) - Feature-defect-vs-test-bug distinction in deviation handling - No TDD cycle (tests are the deliverable) Also fixes issues inherited by both workflows, identified during skill-reviewer audit: - Replace fragile cross-file step-number references with descriptive names in validate.md and respond.md (both workflows) - Add shared config documentation for .artifacts/prd/config.json in ingest.md (both workflows) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 44 minutes and 4 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughAdds a documented end-to-end (e2e) workflow driven by Jira [QE] stories: defines phases (/ingest → /plan → /revise → /code → /validate → /publish → /respond), artifact layout under Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent
participant Jira
participant Repo as "Repo / Local"
participant GH as "GitHub (gh)"
participant E2EInfra as "Project E2E Infra"
User->>Agent: invoke /ingest (QE story)
Agent->>Jira: fetch story, ACs, deps
Agent->>Repo: create .artifacts/e2e/{jira-key}/01-context.md
Agent->>E2EInfra: discover framework, commands, ref-suite
Agent->>Agent: produce 02-plan.md (AC→scenarios)
User->>Agent: run /code (implement tests)
Agent->>Repo: commit branch, run scoped tests against E2EInfra
Agent->>Agent: run /validate (pre-PR checks + anti-patterns)
Agent->>GH: run /publish (git push, gh pr create)
GH->>Agent: return PR URL/number
Agent->>Agent: run /respond (fetch reviews, propose replies)
User->>Agent: approve responses /apply changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
e2e/skills/code.md (1)
61-71: Add explicit confirmation before remote fork sync operations.Line 66 and Line 302 run
gh repo sync, which can update shared remote branch state. Add an explicit user confirmation gate immediately before this operation (both in Step 2 and Step 3g), similar to other safety stops in the workflow.Based on learnings: “For git operations that could affect shared systems, always verify with git status before destructive operations.”
Also applies to: 297-309
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/skills/code.md` around lines 61 - 71, Add an explicit user confirmation prompt immediately before executing the "gh repo sync {owner}/{repo} --branch {base}" command in both Step 2 and Step 3g of e2e/skills/code.md: require the user to inspect git status (e.g., run and review "git status") and type a clear affirmative response before proceeding, and abort the sync if the confirmation is not received; update the text around the existing gh repo sync invocation to include this safety gate and a short note that the operation will modify remote branch state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 201: Update the sentence "Requires a Jira [QE] Story (typically created
by the design workflow's `/decompose` phase) as input" to reference the correct
phase: replace `/decompose` with `/sync` so it reads "...created by the design
workflow's `/sync` phase", ensuring the doc correctly states that `/sync` is
responsible for Jira issue creation.
In `@e2e/skills/validate.md`:
- Around line 103-109: Update the guidance in Step 3 of validate.md to cover
feature defects discovered by e2e tests: add a clause that when a test fails
because the implementation behaves differently from the acceptance criteria (a
feature/DEV defect, not a test bug), the e2e workflow should not attempt to fix
the implementation; instead mark the test as xfail or skip, record the defect in
the implementation/validation report, and continue validation. Locate the Step 3
paragraph in the e2e/skills/validate.md document and insert this explicit
instruction alongside the existing workflow steps so readers know to adjust
tests (xfail/skip) and document the defect rather than changing DEV code during
e2e validation.
---
Nitpick comments:
In `@e2e/skills/code.md`:
- Around line 61-71: Add an explicit user confirmation prompt immediately before
executing the "gh repo sync {owner}/{repo} --branch {base}" command in both Step
2 and Step 3g of e2e/skills/code.md: require the user to inspect git status
(e.g., run and review "git status") and type a clear affirmative response before
proceeding, and abort the sync if the confirmation is not received; update the
text around the existing gh repo sync invocation to include this safety gate and
a short note that the operation will modify remote branch state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: efad8978-984e-496f-8a94-d72f9270ad37
📒 Files selected for processing (23)
AGENTS.mdREADME.mde2e/README.mde2e/SKILL.mde2e/commands/code.mde2e/commands/ingest.mde2e/commands/plan.mde2e/commands/publish.mde2e/commands/respond.mde2e/commands/revise.mde2e/commands/validate.mde2e/guidelines.mde2e/skills/code.mde2e/skills/controller.mde2e/skills/ingest.mde2e/skills/plan.mde2e/skills/publish.mde2e/skills/respond.mde2e/skills/revise.mde2e/skills/validate.mdimplement/skills/ingest.mdimplement/skills/respond.mdimplement/skills/validate.md
…t handling - AGENTS.md: Correct `/decompose` to `/sync` — the `/sync` phase is what creates Jira issues, not `/decompose` which produces the story breakdown - validate.md: Add explicit handling for feature defects discovered during e2e test execution — mark tests as xfail/skip and note the defect rather than attempting to fix the feature implementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Re: user confirmation before Declining this one. The implement workflow uses the same pattern without a confirmation gate, and it has been working well in practice. |
There was a problem hiding this comment.
We need to clear this topic first before we continue.
"Harness" appears 73 times across the workflow. Ginkgo/Go-specific constructs (BeforeSuite, AfterEach, Describe/Context/It, Eventually, By(), Label(), Gomega) appear ~49 times. The workflow says it's discovery-based and framework-agnostic, but the actual instructions are saturated with Ginkgo vocabulary baked into the process steps themselves ? not just the examples.
Where the flightctl bias shows up concretely?
-
Harness as a first-class concept throughout. The workflow treats "harness" as a universal concept that every project has ? it's in the guidelines, the anti-pattern checklist ("harness bypass"), the code phase ("Use the project's harness"), the plan template ("Harness Usage" section), the ingest discovery steps (6e is entirely about harness exploration). Many projects don't have a test harness at all. A Python project might just use pytest fixtures. A JS project might use Playwright's built-in page objects. The workflow should discover what abstraction the project uses for test infrastructure (if any) rather than assuming "harness" exists.
-
Ginkgo terminology is structural, not just illustrative. The plan.md template has a disclaimer saying "This template uses Ginkgo terminology as illustrative shorthand" - but the terminology isn't confined to examples. It's embedded in process instructions:
- code.md Step 3b: "Match the block structure: Use the project's Describe/Context/It nesting" ? then proceeds to give 8 sub-instructions all using Ginkgo concepts (By(), Eventually, AfterEach, Label())
- The anti-pattern table in validate.md references time.Sleep(), BeforeEach, AfterEach, Eventually as detection/fix patterns
- plan.md critical rules: "Name the test files, Describe/Context/It blocks, harness methods"
- The ingest context template has a hardcoded "Setup/Teardown Patterns" section with BeforeSuite/BeforeEach/AfterEach/AfterSuite fields
For a pytest project, none of these terms apply. For Playwright, the lifecycle model is different. The workflow should say things like "use the project's test lifecycle hooks" and "use the project's test organization vocabulary" - then let the ingest phase discover what those actually are.
-
Auxiliary services as a baked-in concept. Steps 6g, the context template, and the plan template all have dedicated "Auxiliary Services" sections. This is a flightctl pattern (testcontainers, registries, git servers). Many e2e test suites don't manage auxiliary services ? they test against a deployed environment, or the framework handles service lifecycle internally. This should be conditional - discovered if present, omitted if not.
-
Reference suite selection assumes a specific project structure. Step 6j assumes the project has multiple test suites organized by feature area, with suite files + test files. This fits flightctl's test/e2e/{feature}/ structure. A project with flat test files, or a single test directory, or a page-object pattern would need a different discovery approach.
What should change?
The workflow should be restructured around a principle it already claims to follow but doesn't fully deliver: discover, don't assume. Specifically:
- Replace "harness" with a generic concept like "test infrastructure" or "test abstractions" - then let ingest discover whether the project uses a harness, fixtures, page objects, helper modules, or nothing at all
- Make the Ginkgo vocabulary truly illustrative - process instructions should use generic terms ("test lifecycle hooks", "test grouping blocks", "async waiting mechanisms") with Ginkgo/pytest/Playwright examples in parentheses, not Ginkgo terms with "(or the project's equivalent)" tacked on
- Make the context template flexible ? instead of hardcoded BeforeSuite/BeforeEach/AfterEach/AfterSuite fields, use a generic "Test Lifecycle" section that captures whatever hooks the project uses
- Make auxiliary services conditional ? the template should note "if the project manages test services, document them here" rather than having a mandatory section
- Let project docs drive the vocabulary - the workflow already reads AGENTS.md and CLAUDE.md during ingest. Those files should be the source of truth for how tests are structured, what terminology to use, and what conventions to follow. The workflow instructions should defer to those rather than prescribing Ginkgo patterns
The e2e-specific innovations (scenario-driven planning, anti-pattern checklist, feature-defect distinction, reference suite pattern) are genuinely good and framework-agnostic in concept. They just need to be expressed in framework-agnostic language in practice.
| (e.g., `*_suite_test.go` + `*_test.go` in Go/Ginkgo) | ||
| 4. **Naming conventions:** How are test files and directories named? | ||
|
|
||
| #### 6e: Test Harness |
There was a problem hiding this comment.
This is very specific to flightctl.
Trying to stay on the generic side, are all projects expected to have a harness?
The answer obviously is no. I would expect that these definitions will come from inside the project (Claude.md, AGENTS.md. etc...).
There was a problem hiding this comment.
Addressed in 2c4783e — this section is now titled "Test Infrastructure Abstractions" and lists harness, fixtures, page objects, and helpers as equally valid options with "(it may use one, several, or none)." The discovery instructions are generic: find whatever abstractions the project uses.
| domain (e.g., `harness_device.go`, `harness_fleet.go`). Identify which | ||
| domain files are relevant to the story. | ||
|
|
||
| #### 6f: Setup and Teardown Patterns |
There was a problem hiding this comment.
Again, specific to a runtime and framework...
There was a problem hiding this comment.
Addressed in 2c4783e — the discovery steps (Location, Initialization, Key methods, Domain-specific files) are framework-agnostic. The parenthetical examples (pytest fixtures, Playwright fixtures, etc.) are illustrative, not prescriptive.
| |--------|---------|-------------| | ||
| | `{method}` | {what it does} | {file} | | ||
|
|
||
| ### Setup/Teardown Patterns |
There was a problem hiding this comment.
Specific, even though very popular
There was a problem hiding this comment.
This section is a generic template field — it captures whatever testing approach the story or design document specifies. The template text says "If none: derive scenarios from acceptance criteria." Nothing framework-specific here.
Replace Ginkgo-specific vocabulary with generic terms throughout all e2e skill files. Generic terms now appear in process instructions with framework-specific examples in parentheses (e.g., "test grouping blocks (Describe/Context/It in Ginkgo, test classes in pytest, describe/it in Playwright)"). Key changes: - "harness" as structural term → "test infrastructure abstractions" with harness/fixtures/page objects/helpers as discovered examples - Ginkgo lifecycle hooks → generic "test lifecycle hooks" with project-specific discovery - Auxiliary services made explicitly conditional (not all projects manage test services) - Anti-pattern detection table rewritten with multi-framework examples (Go, Python, Playwright) - Plan and code templates use generic vocabulary with notes to substitute the project's discovered terminology Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
amir-yogev-gh
left a comment
There was a problem hiding this comment.
Single nit comment
…ples to e2e workflow Enhance /ingest to discover the project's parallelism model (mechanism, isolation strategy, lifecycle interaction) and extract a sanitized lifecycle skeleton from the reference suite as a copy-paste starting point for /code. Expand the test infrastructure method table to include full signatures (parameters and return types). Add concrete framework examples to anti-pattern detection rules in /validate (e.g., bare Expect without Eventually in Ginkgo, bare assert without polling in pytest). Update AGENTS.md summary to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/skills/code.md`:
- Around line 332-342: Step 3g currently runs the branch sync rebase/merge
without verifying the working tree; add a clean-tree check that mirrors Step 2
by running a git status --porcelain right before the rebase/merge block in Step
3g, and if the output is non-empty abort the flow and ask the user to resolve or
confirm before continuing; ensure this check surrounds both the rebase (git
rebase origin/{base}) and merge (git merge origin/{base}) paths so the logic in
Step 3g enforces the same safety pattern as Step 2.
In `@e2e/skills/validate.md`:
- Around line 73-89: Add a clean-working-tree gate before the Step 2
rebase/merge suggestions: run and check the output of git status --porcelain
and, if non-empty, stop and report the changed files to the user and prompt for
how to proceed (stash, commit, or abort) before offering git rebase
origin/{base} or git merge origin/{base}; ensure the flow for the commands git
rebase origin/{base} and git merge origin/{base} only continues after explicit
user confirmation that the working tree is clean or after the user chooses and
completes one of the remediation options.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 20109f03-90f7-4565-920b-c0729818f81d
📒 Files selected for processing (10)
AGENTS.mde2e/README.mde2e/guidelines.mde2e/skills/code.mde2e/skills/controller.mde2e/skills/ingest.mde2e/skills/plan.mde2e/skills/publish.mde2e/skills/revise.mde2e/skills/validate.md
✅ Files skipped from review due to trivial changes (4)
- e2e/skills/revise.md
- e2e/skills/controller.md
- e2e/README.md
- e2e/skills/publish.md
🚧 Files skipped from review as they are similar to previous changes (1)
- AGENTS.md
Reinforce in /ingest that this workflow is exclusively for e2e tests — unit and integration tests are handled by the implement workflow. Add git status --porcelain checks before rebase/merge operations in /code Step 3g and /validate Step 2, matching the safety pattern already used in the implement workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
e2e/workflow that takes[QE]Jira stories through an 8-phase pipeline (ingest → plan → revise → code → validate → publish → respond) to produce end-to-end testsimplement/identified during a skill-reviewer audit: fragile cross-file step-number references replaced with descriptive names, shared config documentation addedKey design decisions
/ingest, not assumed/code, ensuring new tests match project conventions/validatechecks for hardcoded sleeps, brittle selectors, order-dependent tests, shared mutable state, missing cleanup, harness bypass, missing labels, hardcoded values, missing async polling, and missing failure diagnosticsTest plan
bash .github/scripts/validate-structure.sh— passesnpx markdownlint-cli2— no errors in e2e/ or modified implement/ filesimplement/skill changes don't alter behavior (cross-file references now use descriptive names instead of step numbers, shared config note added)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation