Skip to content

fix(rules): reconcile Units Planning ghost stage with canonical Units Generation#156

Open
scottschreckengaust wants to merge 7 commits intomainfrom
prompt-eng/review-aidlc-rules
Open

fix(rules): reconcile Units Planning ghost stage with canonical Units Generation#156
scottschreckengaust wants to merge 7 commits intomainfrom
prompt-eng/review-aidlc-rules

Conversation

@scottschreckengaust
Copy link
Copy Markdown
Member

@scottschreckengaust scottschreckengaust commented Mar 30, 2026

Summary

Fixes the highest-priority finding from a comprehensive prompt engineering review of the aidlc-rules/ directory.

Problem: "Units Planning" was referenced as a separate top-level workflow stage in 5 files, but core-workflow.md only defines a single "Units Generation" stage (with planning as an internal Part 1 sub-step). This inconsistency causes state tracking confusion, session continuity breakage, and incorrect Mermaid diagrams.

Fix: Reconcile all references to use the canonical "Units Generation" stage name, clarifying that planning is an internal sub-step.

Files changed

File Change
workflow-planning.md Merged UP/UG Mermaid nodes into one; removed duplicate checklist items from both templates
units-generation.md Fixed Step 11 to reference "Units Generation Part 1 (Planning)" instead of ghost stage
terminology.md Updated stage list, definition, usage guidance, and annotated planning/generation examples
workflow-changes.md Fixed impact assessments to reference Units Generation instead of ghost stage
error-handling.md Fixed recovery guidance to reference Units Generation

Full Prompt Engineering Review

This PR addresses finding C1 (the highest priority) from a comprehensive review of all 30 files in aidlc-rules/. The complete list of 25 findings (3 Critical, 8 High, 9 Medium, 5 Low) with detailed descriptions, affected files, and recommended fixes is attached as a PR comment.

Priority ID Severity Finding Status
1 C1 Critical "Units Planning" ghost stage — state tracking and session continuity breakage This PR
2 C2 Critical "Never ask in chat" conflicts with inline approval prompts Backlog
3 C3 Critical Terminology glossary uses nonexistent stage names ("Context Assessment") Backlog
4 H1 High Reverse Engineering artifact lists inconsistent across 3 files Backlog
5 H2 High 3-option completion messages violate NO EMERGENT BEHAVIOR rule Backlog
6 H3 High overconfidence-prevention.md is a changelog, not an actionable rule Backlog
7 H4 High Most common rule files have no defined loading trigger Backlog
8 H5 High "Assume the role" promotes overconfidence Backlog
9 H6 High "No fixed sequences" claim is factually wrong Backlog
10 H7 High Extension enforcement default contradicts opt-in model Backlog
11 H8 High OWASP mapping uses fabricated 2025 edition Backlog
12-25 Medium/Low 14 additional findings (see review comment) Backlog

Test plan

  • Verify no remaining standalone "Units Planning" stage references in aidlc-rules/ (only the annotated terminology example should remain)
  • Confirm core-workflow.md stage names match all checklist templates in workflow-planning.md
  • Confirm Mermaid diagrams show a single Units Generation node

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

… Generation

Units Planning was referenced as a separate workflow stage in 5 files
(workflow-planning.md, units-generation.md, terminology.md,
workflow-changes.md, error-handling.md) but was never defined as a
top-level stage in core-workflow.md. It is actually Part 1 (an internal
sub-step) of the Units Generation stage.

This inconsistency caused:
- State tracking confusion (aidlc-state.md would have two entries for
  one stage)
- Session continuity breakage (stage names in state file wouldn't match
  what the rules expect)
- Mermaid diagram showing 2 nodes for a single stage

Changes:
- Merge UP/UG Mermaid nodes into single Units Generation node
- Remove duplicate Units Planning checklist items from execution plans
- Clarify that planning/generation are internal sub-steps, not stages
- Update all impact assessments and error recovery references
@scottschreckengaust
Copy link
Copy Markdown
Member Author

Prompt Engineering Review — aidlc-rules/ (2026-03-30)

Comprehensive review of all 30 files in aidlc-rules/ by severity and priority.


Critical (3)

C1. "Units Planning" Ghost Stage ✅ Fixed in this PR

Files: workflow-planning.md, units-generation.md, terminology.md, workflow-changes.md, error-handling.md

"Units Planning" was referenced as a separate top-level workflow stage in 5 files, but core-workflow.md only defines a single "Units Generation" stage with planning as an internal Part 1 sub-step. This causes state tracking confusion in aidlc-state.md, session continuity breakage on resume, and incorrect Mermaid diagrams showing two nodes for one stage.

Fix: Reconcile all references to use the canonical "Units Generation" stage name, clarifying that planning is an internal sub-step.


C2. "Never Ask Questions in Chat" Conflicts with Inline Approval Prompts

Files: common/question-format-guide.md, core-workflow.md, inception/units-generation.md, all construction stage files, common/session-continuity.md

question-format-guide.md states: "CRITICAL: You must NEVER ask questions directly in the chat. ALL questions must be placed in dedicated question files." Yet core-workflow.md says Ask: "Build and test instructions complete. Ready to proceed to Operations stage?" — a direct chat question. Every stage completion message presents "Request Changes" / "Continue to Next Stage" choices directly in chat. session-continuity.md also presents inline choices while separately stating: "ALWAYS ask clarification or user feedback questions by placing them in .md files."

Fix: Scope the "never ask in chat" rule to requirements-gathering questions only. Restate as: "Requirements-gathering and clarification questions MUST be placed in dedicated .md files with [Answer]: tags. Stage completion prompts and approval requests MAY be presented inline in chat."


C3. Terminology Glossary Uses Nonexistent Stage Names

Files: common/terminology.md (line 13)

Usage examples reference "Context Assessment stage" and "Requirements Assessment stage." Neither exists in the workflow. Actual names are "Workspace Detection" and "Requirements Analysis." These appear to be leftover names from a previous version.

Fix: Replace "Context Assessment" with "Workspace Detection" and "Requirements Assessment" with "Requirements Analysis." Audit the entire terminology file for other stale references.


High (8)

H1. Reverse Engineering Artifact List Inconsistent Across 3 Files

Files: core-workflow.md (lines 126-133), inception/reverse-engineering.md, common/session-continuity.md (lines 28-29)

Three files list different artifact sets. core-workflow.md mentions "Interaction Diagrams" (not in reverse-engineering.md). reverse-engineering.md generates code-quality-assessment.md and reverse-engineering-timestamp.md (not in core-workflow.md). session-continuity.md only lists 3 of 9 artifacts for context loading on resume.

Fix: Establish a single canonical artifact list in reverse-engineering.md. Ensure both core-workflow.md and session-continuity.md reference it accurately. Either add "Interaction Diagrams" to reverse-engineering.md or remove from core-workflow.md. Add all missing artifacts to session-continuity.md's load list.


H2. 3-Option Completion Messages Violate NO EMERGENT BEHAVIOR Rule

Files: inception/application-design.md (lines 116-132), core-workflow.md (line 460), construction/build-and-test.md

core-workflow.md states: "NO EMERGENT BEHAVIOR: Construction phases MUST use standardized 2-option completion messages." Yet application-design.md uses a 3-option message (Request Changes, Add Units Generation, Approve & Continue). build-and-test.md also uses 3 options. The rule says "Construction phases" but Application Design is in Inception — the scope is unclear.

Fix: Either extend 2-option standardization to all phases (handle "add skipped stage" through workflow-changes.md instead), or explicitly scope the rule and document which stages may present more than 2 options.


H3. overconfidence-prevention.md Is a Changelog, Not an Actionable Rule

Files: common/overconfidence-prevention.md, inception/application-design.md (line 41)

This file is a human-readable rationale document ("The overconfidence issue was caused by..."). It describes what was wrong and changed, not what an agent should do. The actual behavior is already baked into stage files. Loading it wastes context. It is referenced once in application-design.md which may confuse agents.

Fix: Move outside rule-details/ to a docs/ or design-decisions/ folder, or convert into an actionable rule file with clear directives.


H4. Most Common Rule Files Have No Defined Loading Trigger

Files: core-workflow.md (lines 21-26), all 11 files in common/

core-workflow.md mandates loading 4 common files at start: process-overview.md, session-continuity.md, content-validation.md, question-format-guide.md. The remaining 7 (ascii-diagram-standards.md, depth-levels.md, error-handling.md, overconfidence-prevention.md, terminology.md, welcome-message.md, workflow-changes.md) have no loading trigger. error-handling.md is 375 lines of critical recovery procedures that may never be loaded.

Fix: For each common file, define when it should be loaded — either add to the mandatory list, specify conditional triggers, or create a lightweight index.


H5. "Assume the Role of a Product Owner" Promotes Overconfidence

Files: inception/requirements-analysis.md (line 3), inception/user-stories.md (line 119)

Two files tell the agent to "assume the role of a product owner." This conflicts with overconfidence prevention — a product owner makes business decisions, but the agent is instructed elsewhere to defer to the user. Role assumption causes the agent to prioritize features and define acceptance criteria instead of asking.

Fix: Replace role-assumption directives with behavioral instructions: "Apply product ownership analysis techniques to evaluate completeness of requirements and quality of user stories. Defer all business decisions to the user."


H6. "No Fixed Sequences" Claim Is Factually Wrong

Files: common/process-overview.md (line 22), core-workflow.md

process-overview.md states: "No fixed sequences: Stages execute in the order that makes sense for your specific task." But core-workflow.md defines a completely fixed sequence. Conditional stages can be skipped but cannot be reordered.

Fix: Replace with: "Stages execute in a defined sequence, but conditional stages can be skipped when they do not add value."


H7. Extension Enforcement Default Contradicts Opt-In Model

Files: core-workflow.md (line 49)

Line 49 states: "Default to enforced if no configuration exists." But extensions are opt-in: *.opt-in.md files are loaded, users are asked during Requirements Analysis, answers are recorded. If an extension has an opt-in prompt the user hasn't answered yet, "default to enforced" would enforce rules the user never agreed to.

Fix: Change to: "If an extension has an opt-in file and the user has not yet been asked, defer enforcement until the opt-in question is presented and answered. Extensions without opt-in files are always enforced."


H8. OWASP Reference Mapping Uses Fabricated "2025" Edition

Files: extensions/security/baseline/security-baseline.md (lines 295-308)

The appendix references "OWASP Top 10 (2025)" which does not exist. The file contains a TODO comment acknowledging this. Category IDs (A01:2025, etc.) are fabricated. Only 8 of 15 SECURITY rules are mapped.

Fix: Update to OWASP Top 10 (2021) with correct category IDs, or remove the mapping table until verified. Map all 15 rules.


Medium (9)

M1. depth-levels.md Is Redundant with Stage-Level Guidance

Files: common/depth-levels.md, inception/requirements-analysis.md, inception/workflow-planning.md

Provides no concrete thresholds or decision criteria. Says "Model decides." Depth guidance is already embedded in stage files. Not in the mandatory loading list.

Fix: Integrate into workflow-planning.md or add concrete decision thresholds. Currently redundant.


M2. No Open-Ended Question Format Allowed

Files: common/question-format-guide.md

Mandates ALL questions use multiple-choice A/B/C/D format. Many requirements questions are inherently open-ended ("Describe the primary business process"). Forcing these into multiple-choice produces artificial options.

Fix: Add an open-response format option with [Answer]: tag. Document when each format is appropriate.


M3. Heavy Emoji Usage with No Configuration Option

Files: common/welcome-message.md, all completion message templates

Extensive emoji usage throughout interactive messages. No way to disable for enterprise environments. Inconsistent application.

Fix: Consider a tone/formatting config, or ensure consistent application. Alternatively, limit emoji to chat messages only, not generated artifacts.


M4. Build-and-Test Generates Documentation But Template Implies Execution

Files: construction/build-and-test.md

Generates instruction documents but never runs build/test commands. Summary template has fields for "Build Status: [Success/Failed]" and "Total Tests: [X], Passed: [X]" that imply actual execution.

Fix: Rename to "Build and Test Planning," add execution instructions, or clarify template values are "expected" not "actual."


M5. "Interaction Diagrams" Requirement Has No Template or Guidance

Files: core-workflow.md (line 131), inception/reverse-engineering.md

core-workflow.md lists "Generate Interaction Diagrams" as a Reverse Engineering output. reverse-engineering.md has no corresponding step, template, or target file.

Fix: Add a step in reverse-engineering.md for generating Interaction Diagrams, or remove the requirement from core-workflow.md.


M6. Session Continuity Misses Application Design Artifacts

Files: common/session-continuity.md (line 31), inception/application-design.md

session-continuity.md loads components.md, component-methods.md, services.md for Application Design. Missing: component-dependency.md and the consolidated application-design.md (which contains everything).

Fix: Add all Application Design artifacts, or reference the consolidated application-design.md.


M7. Error Handling References Nonexistent Operations Stage

Files: common/error-handling.md (lines 163-172)

Has an "Operations Errors" section covering build tool detection and deployment errors. Operations is a placeholder stage with no defined steps. These errors belong in Build and Test.

Fix: Rename to "Build and Test Errors."


M8. Functional Design Prerequisite Chain Unclear

Files: construction/functional-design.md (lines 16-19)

States Application Design is "recommended" but Units Generation is "required." However, Step 1 reads artifacts from Application Design. Can Functional Design run without Application Design?

Fix: Make Application Design a hard prerequisite if its artifacts are needed, or document what to do when they're absent.


M9. Content Validation Has Broken Markdown in Its Own Mermaid Example

Files: common/content-validation.md (lines 42-55)

Nested triple-backtick code blocks are not properly escaped, causing the outer code block to close prematurely. This is exactly the kind of error the file is supposed to prevent.

Fix: Use four backticks for the outer block or HTML entities for inner backticks.


Low (5)

L1. Answer-Analysis Guidance Duplicated in 7+ Files

Files: common/overconfidence-prevention.md, common/question-format-guide.md, inception/application-design.md, inception/user-stories.md, inception/units-generation.md, construction/functional-design.md, construction/nfr-requirements.md

Same examples ("You mentioned 'mix of A and B'") repeated word-for-word across 7+ files.

Fix: Consolidate into a single common file and reference it.


L2. ASCII Diagram Examples May Not Comply with Their Own Width Rule

Files: common/ascii-diagram-standards.md (lines 35-44, 47-59)

"Every line in a box MUST have EXACTLY the same character count." Examples may have inconsistent widths.

Fix: Verify and correct examples.


L3. Mermaid Template Splits Units into Two Nodes ✅ Fixed as part of C1

Files: inception/workflow-planning.md (lines 264-265)


L4. "Other" Option Letter Inconsistent (X vs Sequential)

Files: common/question-format-guide.md, extension opt-in files

Template says use "X)" but examples use "E)", "D)", "C)".

Fix: Standardize on "X)" for the "Other" option everywhere.


L5. Acknowledged Duplication Between process-overview.md and welcome-message.md Has Diverged

Files: common/process-overview.md, common/welcome-message.md

process-overview.md says duplication with welcome-message.md is "INTENTIONAL" but the content has diverged (different diagrams, different claims about sequencing).

Fix: Ensure factual consistency even if format differs.

@scottschreckengaust scottschreckengaust added the codebuild A label to signal a request for the "CodeBuild" workflow label Mar 30, 2026
@scottschreckengaust scottschreckengaust added codebuild A label to signal a request for the "CodeBuild" workflow and removed codebuild A label to signal a request for the "CodeBuild" workflow labels Mar 30, 2026
@awslabs awslabs deleted a comment from github-actions bot Mar 31, 2026
@scottschreckengaust scottschreckengaust added rules and removed codebuild A label to signal a request for the "CodeBuild" workflow labels Mar 31, 2026
@scottschreckengaust scottschreckengaust marked this pull request as ready for review April 7, 2026 05:15
@scottschreckengaust scottschreckengaust requested review from a team as code owners April 7, 2026 05:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

A. Executive Summary

Latest release: PR #156

High-level snapshot comparing the latest release against the golden baseline (the reference evaluation used as the quality target).

Metric What it measures
Unit tests passed Number of generated unit tests that pass. Higher means the rules produce broader, more complete test suites.
Contract tests API compliance checks against the OpenAPI spec (passed/total). 88/88 = full compliance.
Lint findings Static analysis warnings in generated code. Lower is better — 0 means clean code.
Qualitative score AI-graded quality of generated documentation on a 0–1 scale (higher is better).
Execution time Wall-clock time for the full evaluation run. Lower means faster generation.
Total tokens Total LLM tokens consumed (input + output). Lower means more cost-efficient.
Metric Golden Latest (PR #156) vs Golden Trend
Unit tests passed 180 132 -48 █▄▄▁▂▃▃▁▁
Contract tests 88/88 76/88 -12 ███▆████▁
Lint findings 0 0 ▅▅▅▅▅▅▅▅▅
Qualitative score 0.854 0.773 -0.081 ▅▇▇▆▇█▁▁▁
Execution time 23.8m 17.6m -6.3m ▁▅▁▅▂▄█▆▃
Total tokens 18.39M 4.67M -13.73M ▄▇▃▆▆█▄▂▁

Full trend report available in the workflow artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant