Update eval README and add unit test README (#22)

jrenaldi79 · claude · web-flow · commit 67944baa9283 · 2026-03-25T08:12:48.000+09:00
Eval README: added Python fixture, updated setup config docs (claude_md_must_mention, auto_doc_pipeline, docs_index_generated, Husky hook support), added streaming guidance, fixed artifact list. Unit test README: new file documenting all 9 test files, conventions, and how to run. CLAUDE.md: added tests/scripts/README.md to Docs Map. https://claude.ai/code/session_01Hbxy31TkbujzukGFSxLcPw Co-authored-by: Claude <noreply@anthropic.com>
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -111,6 +111,7 @@ tests/
     ├── init-project.test.js  # Tests for skills/setup/scripts/init-project.js
     ├── install-enforcement.test.js  # Tests for skills/setup/scripts/install-enforcement.js
     ├── marketplace-schema.test.js  # Tests for .claude-plugin/marketplace.json schema validity.
+    ├── README.md
     ├── release.test.js  # Tests for scripts/release.sh — validates version bumping, changelog
     └── repo-generate-docs.test.js  # Tests for scripts/repo-generate-docs.js — the repo-level CLAUDE.md
 <!-- /AUTO:tree -->
@@ -210,3 +211,4 @@ Before merging:
 | Enforcement script patterns | `skills/setup/references/enforcement-scripts.md` |
 | Node/TypeScript stack reference | `skills/setup/references/stack-node-typescript.md` |
 | Eval suite documentation | `tests/evals/README.md` |
+| Unit test documentation | `tests/scripts/README.md` |
diff --git a/tests/evals/README.md b/tests/evals/README.md
@@ -27,6 +27,8 @@ each skill against fixture repos and validates the output.
 ./tests/evals/run-evals.sh --config setup-eval-config.json --dry-run
 ```
 
+**Run evals as Bash shell commands** so output streams live and you can monitor progress. Evals can take 5-15 minutes per fixture.
+
 ## How It Works
 
 1. **Fixtures** (`fixtures/`) — Self-contained project directories representing test scenarios
@@ -46,10 +48,11 @@ The runner accepts `--config <file>` to select which eval suite to run. Each con
 
 ## Setup Fixtures
 
-| Fixture | What It Tests |
-|---|---|
-| `setup-bare` | Empty directory — full greenfield Node/TS Express setup path (scaffolding, enforcement, hooks, docs). |
-| `setup-existing-node` | Existing Express project — enhancement path that adds enforcement without destroying existing files. |
+| Fixture | Stack | What It Tests |
+|---|---|---|
+| `setup-bare` | Node/TS Express (new) | Full greenfield setup via fast path (scripts). |
+| `setup-existing-node` | Node/TS Express (existing) | Enhancement without destroying existing files. |
+| `setup-python` | Python FastAPI (new) | Adaptive path (Claude creates files, not scripts). Checks for Node leakage. |
 
 ## Eval Configs
 
@@ -65,12 +68,14 @@ The runner accepts `--config <file>` to select which eval suite to run. Each con
 - **files_must_exist** — Files the skill must create (hard fail if missing)
 - **files_should_exist** — Recommended files (soft check)
 - **json_valid** — Files that must parse as valid JSON
-- **hooks_executable** — Git hooks that must have execute permissions
+- **hooks_executable** — Git hooks (checks both `.git/hooks/` and `.husky/`)
 - **claude_md_sections** — Sections that must appear in generated CLAUDE.md
+- **claude_md_must_mention / claude_md_must_not_mention** — Content checks on generated CLAUDE.md
 - **settings_has_allow_deny** — Verify .claude/settings.json has allow/deny permission lists
 - **rules_have_globs_frontmatter** — Verify .claude/rules/*.md files have `globs:` in frontmatter
+- **auto_doc_pipeline** — Verify generate-docs scripts, pre-commit hook wiring, and AUTO markers
+- **docs_index_generated** — Verify docs/ directory and docs/index.md exist
 - **existing_files_preserved** — Files from the original fixture that must not be deleted
-- **conversation_must_mention** — Terms the skill output must include
 
 ## Adding a New Fixture
 
@@ -83,11 +88,12 @@ The runner accepts `--config <file>` to select which eval suite to run. Each con
 Each run creates a timestamped directory under `results/` containing:
 
 - `claude-output.json` — Raw Claude CLI JSON output
-- `conversation.txt` — Extracted conversation text
+- `conversation.txt` — Extracted conversation text (final message only)
 - `grade.json` — Grader output with pass/fail per check
 - `stderr.log` — Any stderr from the Claude run
 - `duration.txt` — How long the run took
 - `summary.json` — Aggregate pass/fail/error counts
 
 For readiness evals: `readiness-report.md` (the generated report)
-For setup evals: `CLAUDE.md`, `package.json`, `.claude/settings.json`, `.claude/rules/` (captured artifacts)
+
+For setup evals: Full project artifacts captured for debugging — `scripts/`, `docs/`, `src/`, `tests/`, `.claude/`, `.husky/`, `.git/hooks/`, `CLAUDE.md`, `package.json`, etc.
diff --git a/tests/scripts/README.md b/tests/scripts/README.md
@@ -0,0 +1,31 @@
+# Unit Tests
+
+Jest unit tests for the enforcement scripts, scaffolding tools, and documentation generators.
+
+## Running
+
+```bash
+npx jest --config '{}' tests/scripts/              # All tests
+npx jest --config '{}' tests/scripts/release.test.js  # Single file
+```
+
+## Test Files
+
+| Test | What It Covers |
+|---|---|
+| `detect-source-dirs.test.js` | `detectSourceDirs` adaptive scanning and `buildModuleIndex` across multiple source dirs |
+| `generate-docs.test.js` | `replaceMarkers`, `validateCrossLinks`, `buildDocsIndex`, `checkMarkersAreCurrent` |
+| `generate-docs-helpers.test.js` | `buildDirectoryTree`, `extractJSDocDescription`, `extractExports` |
+| `generate-claude-md.test.js` | CLAUDE.md generation from templates with framework-specific commands |
+| `init-project.test.js` | Node/TS project scaffolding (directories, package.json, tsconfig) |
+| `install-enforcement.test.js` | Enforcement script copying, hook installation, config generation |
+| `marketplace-schema.test.js` | Plugin manifest validation (plugin.json, marketplace.json) |
+| `release.test.js` | Release script: version validation, changelog check, version bump, tagging |
+| `repo-generate-docs.test.js` | Repo-level CLAUDE.md auto-generation (tree + modules for this repo) |
+
+## Conventions
+
+- Each test gets a temp directory (`os.tmpdir`), cleaned up in `afterEach`
+- Scripts run as child processes via `execFileSync` to isolate side effects
+- Tests verify both file existence and file content
+- Git operations in tests disable GPG signing (`commit.gpgsign false`)