fix: serialize _write_progress() race condition with lock + atomic_write#6786
fix: serialize _write_progress() race condition with lock + atomic_write#6786genyarko wants to merge 2 commits intoaden-hive:mainfrom
Conversation
levxn
left a comment
There was a problem hiding this comment.
Kindly submit your branch which is upto date with the latest upstream. I see your current branch is on a very old head. can you and push that one again? Will review your PR
15bcdec to
f3916a5
Compare
Done |
levxn
left a comment
There was a problem hiding this comment.
Everything looks good to me
|
I was independently working on this fix and arrived at the same approach — threading.Lock to serialize the read-modify-write cycle + atomic_write to prevent crash corruption. Reviewed the changes and everything looks correct: Lock correctly wraps the entire read → patch → write cycle in _write_progress() |
|
Hi @genyarko, CI lint and tests are failing, can you take a look? Also a heads up: PR description says "Replace bare write_text() with atomic_write()", but main already uses atomic_write. The actual change here is adding the threading.Lock, which is the right fix. Might want to update the description. |
…_write During parallel fan-out execution, multiple threads calling _write_progress() could read the same stale state.json, patch independently, and overwrite each other's updates — causing silent data loss. Additionally, the bare write_text() call could leave a truncated file on crash. Add threading.Lock to serialize the read-modify-write cycle and replace write_text() with the project's existing atomic_write() utility (temp file + fsync + rename) to prevent crash corruption. Closes aden-hive#6696 Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Remove unused `import pytest` - Fix f-strings without placeholders (`f"step"` → `"step"`) - Fix lines exceeding 100-char limit - Simplify atomic write crash test mock to raise OSError directly instead of complex crashing_atomic_write wrapper Co-Authored-By: Claude Opus 4.6 <[email protected]>
f3916a5 to
ed8d584
Compare
|
The PR description was inaccurate from the start. The only actual change our PR makes to executor.py is adding the |
Summary
threading.Lockto_write_progress()to serialize the read-modify-write cycle onstate.json, preventing concurrent fan-out branches from clobbering each other's progress updatesCloses #6696
Changes
core/framework/graph/executor.pyimport threading,self._progress_lockin__init__(), rewrite_write_progress()with lock to serialize concurrent writescore/tests/test_executor_progress_race.pyContext
During
_execute_parallel_branches(), multiple branches call_write_progress()concurrently. Without synchronization, threads read the same stalestate.json, patch independently, and silently overwrite each other's updates. Thethreading.Lockensures each writer reads the state left by the previous writer. The existingatomic_writeswap prevents readers from ever seeing a partial file.No new dependencies — uses only stdlib
threadingand the project's existingframework.utils.io.atomic_write.Test plan
pytest core/tests/test_executor_progress_race.py)pytest core/tests/test_graph_executor.py)🤖 Generated with Claude Code