fix(state): hardening — path-traversal validation + atomic save (v4.4.1) by LEEI1337 · Pull Request #3 · AI-Engineering-at/meta-skills-plugin

LEEI1337 · 2026-05-02T16:26:34Z

Summary

Hardens hooks/lib/state.py against two audit-confirmed risks (Session D Phase 0 quality audit, 2026-05-02):

TASK-2026-00679 — Path-Traversal: session_id whitelist [a-zA-Z0-9_-]{1,64} + Defense-in-Depth resolved-path check
TASK-2026-00680 — Atomic Save: tempfile.mkstemp(dir=STATE_DIR) + os.replace instead of direct path.write_text()

Threat models documented in test-file docstrings. Best-effort behavior preserved: save() still never raises, OSError logged to stderr.

ERPNext

Test Plan

17 new path-traversal tests (TestSessionIdRejection / Acceptance / PathStaysInsideStateDir) — all green
6 new concurrency tests (TestAtomicSave / TestConcurrentSave) — all green, incl. 4-thread stress
Regression: full suite 642 tests green (619 prior + 23 new)
Coverage hooks/lib/state.py: 90.7% (was 87%)
ruff format --check + ruff check clean on changed files
harden.py --scan: 0 Critical findings
Backwards-compat: all 19 existing test_session_state.py session_ids still accepted

Commits (atomic per concern)

bb3766c fix(state): validate session_id against path-traversal (TASK-2026-00679)
c4c386e fix(state): atomic save via tempfile + os.replace (TASK-2026-00680)

Plan / Audit Reference

~/.claude/plans/jaundder-plan-hashed-tiger.md Phase 1+2
Documents/audit-2026-05-01/META-ANALYSIS.md (Path-Traversal + Race-Condition findings)
Filed during Session C PR feat(hooks): 4 new hooks for Opus 4.7 friction mitigation (v4.4.0) #2 v4.4.0 retrospective

🤖 Generated with Claude Code

Reject session_ids that fail [a-zA-Z0-9_-]{1,64} whitelist before any filesystem access. Defense-in-depth: resolved path must remain inside STATE_DIR even if regex were ever loosened (catches symlink edges). Threat model: session_id is normally a UUID generated by Claude Code, but the hook contract treats it as an arbitrary string from JSON stdin. A malicious or buggy upstream that injects "../../etc/passwd" must not be able to write state files outside the configured state directory. Tests (17 new, all green): - TestSessionIdRejection: dotdot, slashes, null-byte, newline, empty, too-long (>64), shell metachars, non-string, None - TestSessionIdAcceptance: UUID, short alphanumeric, underscore, all 19 existing test_session_state.py session_ids remain valid - TestPathStaysInsideStateDir: resolved path parent is STATE_DIR, no file created on rejected id Backwards-compat verified: regression suite 619 + 17 = 636 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Race-condition / partial-write hardening for SessionState.save(): write to a tempfile in STATE_DIR (same filesystem, guaranteed atomic rename), then os.replace into place. Cross-platform atomic since Py3.3. Before: contextlib.suppress(OSError) + path.write_text() — last-write-wins on concurrent saves, half-written files possible on crash mid-stream. After: tempfile + os.replace — concurrent saves serialize at the rename point, partial writes leave tempfile behind (cleaned up in finally) and never corrupt the destination. Best-effort behavior preserved: OSError logged to stderr, never raised. Tempfile fd ownership transferred to fdopen's context-manager (avoids double-close); cleanup paths only act on still-owned resources. os.replace + os.unlink (not Path.replace/.unlink) chosen because tests spy on the atomicity contract via patch("lib.state.os.replace", ...). PTH105/PTH108 silenced inline with intent comment. Tests (6 new, all green): - TestAtomicSave: save uses os.replace, no .tmp left behind, roundtrip after atomic write, partial-write failure preserves original file - TestConcurrentSave: 4 threads x 25 saves produce valid JSON, no .tmp left after concurrent saves Coverage hooks/lib/state.py: 90.7% (target >85%). Regression suite 642 green (was 619 + 23 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…dex finding) PR #3 review by codex-rescue identified Lost-Update bug not covered by the atomic-save fix in c4c386e: when multiple hook subprocesses write the SAME session_id concurrently (real-world: Claude Code spawns 4 parallel UserPromptSubmit hooks — session-init, correction-detect, scope-tracker, false-positive-guard — each owning its own DEFAULTS namespace), atomic-write alone gives last-writer-wins. Hook B's save() overwrites Hook A's just-committed namespace because B's in-memory state was loaded before A's save landed. Three-layer fix: 1. Cross-platform file-lock (`fcntl.flock` on POSIX, `msvcrt.locking` on Windows): a sentinel `.meta-state-{sid}.lock` file gates the read-modify-write window. Auto-released on process exit (POSIX flock semantics + Windows file-handle close). 2. Read-Modify-Write inside the lock: re-read fresh disk state, merge in-memory namespaces over it, atomic-write the merged result. Helpers `_reload_from_disk` (no defaults merge) and `_write_atomic` extracted from save() for clarity and unit-mockability. 3. Dirty-namespace tracking (`_dirty_ns: set[str]`): only namespaces explicitly written via set() / property setters are merged onto disk- reload. Without this, the in-memory _data — which contains DEFAULTS loaded by __init__ — would clobber disk values that other concurrent writers committed for namespaces this writer never touched. cleanup_stale extended: removes companion `.lock` files alongside their `.json` state files, plus orphan-lock sweep for locks whose state file no longer exists. Tests (4 new, all green): - TestSharedSessionLostUpdate: 4-thread barrier-coordinated disjoint namespace writers preserve all updates; 4 × 10-iter stress on disjoint namespaces converges to last-iter values - TestLockFile: .lock file created alongside state on save(); cleanup_stale removes both companions Coverage hooks/lib/state.py: 87.6% (target >85%, missing lines are POSIX fcntl branch on Windows + defensive error-paths). Regression suite 646 green (642 prior + 4 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eview #2) Codex re-review on PR #3 (commit eafebc1) APPROVED the file-lock + RMW fix but flagged one minor blind spot: get() returns the live reference, not a copy. Callers who mutate via reference (e.g. `qg = s.get("ns"); qg["k"] = v`) without a follow-up s.set(ns, qg) won't trigger dirty- namespace tracking, so the mutation is silently lost on save(). In practice this is mitigated because all existing hooks follow the established `set()-after-mutation` pattern (visible throughout test fixtures and hook code). But the contract was implicit. This commit makes it explicit in the docstring so future hook authors won't be surprised. No code change. Lint clean. 49 state tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ai-engineering.at and others added 4 commits May 2, 2026 18:25

LEEI1337 merged commit cf6d8de into main May 2, 2026
7 checks passed

LEEI1337 deleted the fix/state-hardening branch May 2, 2026 17:39

LEEI1337 mentioned this pull request May 2, 2026

chore(lint): ruff config + format sweep + manual cleanup #4

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(state): hardening — path-traversal validation + atomic save (v4.4.1)#3

fix(state): hardening — path-traversal validation + atomic save (v4.4.1)#3
LEEI1337 merged 4 commits into
mainfrom
fix/state-hardening

LEEI1337 commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LEEI1337 commented May 2, 2026

Summary

ERPNext

Test Plan

Commits (atomic per concern)

Plan / Audit Reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant