Skip to content

fix(state): hardening — path-traversal validation + atomic save (v4.4.1)#3

Merged
LEEI1337 merged 4 commits into
mainfrom
fix/state-hardening
May 2, 2026
Merged

fix(state): hardening — path-traversal validation + atomic save (v4.4.1)#3
LEEI1337 merged 4 commits into
mainfrom
fix/state-hardening

Conversation

@LEEI1337

@LEEI1337 LEEI1337 commented May 2, 2026

Copy link
Copy Markdown
Member

Summary

Hardens hooks/lib/state.py against two audit-confirmed risks (Session D Phase 0 quality audit, 2026-05-02):

  1. TASK-2026-00679 — Path-Traversal: session_id whitelist [a-zA-Z0-9_-]{1,64} + Defense-in-Depth resolved-path check
  2. TASK-2026-00680 — Atomic Save: tempfile.mkstemp(dir=STATE_DIR) + os.replace instead of direct path.write_text()

Threat models documented in test-file docstrings. Best-effort behavior preserved: save() still never raises, OSError logged to stderr.

ERPNext

Test Plan

  • 17 new path-traversal tests (TestSessionIdRejection / Acceptance / PathStaysInsideStateDir) — all green
  • 6 new concurrency tests (TestAtomicSave / TestConcurrentSave) — all green, incl. 4-thread stress
  • Regression: full suite 642 tests green (619 prior + 23 new)
  • Coverage hooks/lib/state.py: 90.7% (was 87%)
  • ruff format --check + ruff check clean on changed files
  • harden.py --scan: 0 Critical findings
  • Backwards-compat: all 19 existing test_session_state.py session_ids still accepted

Commits (atomic per concern)

  • bb3766c fix(state): validate session_id against path-traversal (TASK-2026-00679)
  • c4c386e fix(state): atomic save via tempfile + os.replace (TASK-2026-00680)

Plan / Audit Reference

🤖 Generated with Claude Code

ai-engineering.at and others added 4 commits May 2, 2026 18:25
Reject session_ids that fail [a-zA-Z0-9_-]{1,64} whitelist before any
filesystem access. Defense-in-depth: resolved path must remain inside
STATE_DIR even if regex were ever loosened (catches symlink edges).

Threat model: session_id is normally a UUID generated by Claude Code,
but the hook contract treats it as an arbitrary string from JSON stdin.
A malicious or buggy upstream that injects "../../etc/passwd" must not
be able to write state files outside the configured state directory.

Tests (17 new, all green):
- TestSessionIdRejection: dotdot, slashes, null-byte, newline, empty,
  too-long (>64), shell metachars, non-string, None
- TestSessionIdAcceptance: UUID, short alphanumeric, underscore, all
  19 existing test_session_state.py session_ids remain valid
- TestPathStaysInsideStateDir: resolved path parent is STATE_DIR,
  no file created on rejected id

Backwards-compat verified: regression suite 619 + 17 = 636 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Race-condition / partial-write hardening for SessionState.save():
write to a tempfile in STATE_DIR (same filesystem, guaranteed atomic
rename), then os.replace into place. Cross-platform atomic since Py3.3.

Before: contextlib.suppress(OSError) + path.write_text() — last-write-wins
on concurrent saves, half-written files possible on crash mid-stream.

After: tempfile + os.replace — concurrent saves serialize at the rename
point, partial writes leave tempfile behind (cleaned up in finally) and
never corrupt the destination.

Best-effort behavior preserved: OSError logged to stderr, never raised.
Tempfile fd ownership transferred to fdopen's context-manager (avoids
double-close); cleanup paths only act on still-owned resources.

os.replace + os.unlink (not Path.replace/.unlink) chosen because tests
spy on the atomicity contract via patch("lib.state.os.replace", ...).
PTH105/PTH108 silenced inline with intent comment.

Tests (6 new, all green):
- TestAtomicSave: save uses os.replace, no .tmp left behind, roundtrip
  after atomic write, partial-write failure preserves original file
- TestConcurrentSave: 4 threads x 25 saves produce valid JSON, no .tmp
  left after concurrent saves

Coverage hooks/lib/state.py: 90.7% (target >85%).
Regression suite 642 green (was 619 + 23 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dex finding)

PR #3 review by codex-rescue identified Lost-Update bug not covered by
the atomic-save fix in c4c386e: when multiple hook subprocesses write
the SAME session_id concurrently (real-world: Claude Code spawns 4
parallel UserPromptSubmit hooks — session-init, correction-detect,
scope-tracker, false-positive-guard — each owning its own DEFAULTS
namespace), atomic-write alone gives last-writer-wins. Hook B's save()
overwrites Hook A's just-committed namespace because B's in-memory
state was loaded before A's save landed.

Three-layer fix:

1. Cross-platform file-lock (`fcntl.flock` on POSIX, `msvcrt.locking`
   on Windows): a sentinel `.meta-state-{sid}.lock` file gates the
   read-modify-write window. Auto-released on process exit (POSIX flock
   semantics + Windows file-handle close).

2. Read-Modify-Write inside the lock: re-read fresh disk state, merge
   in-memory namespaces over it, atomic-write the merged result. Helpers
   `_reload_from_disk` (no defaults merge) and `_write_atomic` extracted
   from save() for clarity and unit-mockability.

3. Dirty-namespace tracking (`_dirty_ns: set[str]`): only namespaces
   explicitly written via set() / property setters are merged onto disk-
   reload. Without this, the in-memory _data — which contains DEFAULTS
   loaded by __init__ — would clobber disk values that other concurrent
   writers committed for namespaces this writer never touched.

cleanup_stale extended: removes companion `.lock` files alongside their
`.json` state files, plus orphan-lock sweep for locks whose state file
no longer exists.

Tests (4 new, all green):
- TestSharedSessionLostUpdate: 4-thread barrier-coordinated disjoint
  namespace writers preserve all updates; 4 × 10-iter stress on disjoint
  namespaces converges to last-iter values
- TestLockFile: .lock file created alongside state on save(); cleanup_stale
  removes both companions

Coverage hooks/lib/state.py: 87.6% (target >85%, missing lines are POSIX
fcntl branch on Windows + defensive error-paths).
Regression suite 646 green (642 prior + 4 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eview #2)

Codex re-review on PR #3 (commit eafebc1) APPROVED the file-lock + RMW
fix but flagged one minor blind spot: get() returns the live reference,
not a copy. Callers who mutate via reference (e.g. `qg = s.get("ns");
qg["k"] = v`) without a follow-up s.set(ns, qg) won't trigger dirty-
namespace tracking, so the mutation is silently lost on save().

In practice this is mitigated because all existing hooks follow the
established `set()-after-mutation` pattern (visible throughout test
fixtures and hook code). But the contract was implicit. This commit
makes it explicit in the docstring so future hook authors won't be
surprised.

No code change. Lint clean. 49 state tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LEEI1337 LEEI1337 merged commit cf6d8de into main May 2, 2026
7 checks passed
@LEEI1337 LEEI1337 deleted the fix/state-hardening branch May 2, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant