fix(device): serialize ExecuteTextCommandAsync (closes #186)#196
fix(device): serialize ExecuteTextCommandAsync (closes #186)#196cptkoolbeenz wants to merge 5 commits into
Conversation
Adds a SemaphoreSlim around ExecuteTextCommandAsync so concurrent
callers (e.g. simultaneous GetSdCardFilesAsync + DrainErrorQueueAsync
+ GetSystemInfoAsync) no longer race the protobuf-consumer
pause/swap/restart sequence on the same stream.
Without the lock, two callers could:
- Spawn separate text consumers on the SAME underlying stream — each
gets a random subset of the device's reply lines.
- Interleave SCPI bytes on the wire — firmware sees intermixed bytes
and may reply with parse errors.
- Race the protobuf-consumer restart in the outer finally block.
- Stop a consumer mid-restart (one caller's StopSafely targets the
consumer the other caller just restarted).
Implementation:
- New `_textExchangeLock = new SemaphoreSlim(1, 1)` for mutual
exclusion. Disposed in Dispose().
- New `_textExchangeOwnerThreadId` tracks the holding thread. A
same-thread re-entrant call (a setupAction that itself calls
ExecuteTextCommandAsync) would corrupt the consumer swap; we
detect it BEFORE WaitAsync() and throw InvalidOperationException
("not re-entrant"). Strictly safer than the silent deadlock
WaitAsync would otherwise produce.
- ALL validation moved INSIDE the lock so a competing thread
calling DisconnectAsync() / Dispose() during our WaitAsync()
doesn't leave us with a stale _transport / _messageConsumer
reference (TOCTOU close).
- New disposed/disconnecting guard at the top of the locked body
rejects calls during teardown — even before the existing
IsConnected / transport-null checks.
- Owner cleared and semaphore Released in the outer finally so
validation failures can't leak the lock (verified in tests).
5 new tests in DaqifiDeviceTextCommandLockTests:
- Same-thread owner → InvalidOperationException("not re-entrant")
- _isDisconnecting=true → throws "disposing or disconnecting"
- _disposed=true → throws "disposing or disconnecting"
- Lock released after validation failure (second call doesn't deadlock)
- Owner thread ID cleared after method returns (even on exception)
Tests use reflection to set the relevant private fields — those guards
run before any transport/consumer interaction, so this gives faithful
coverage without standing up a real transport stack.
Mirrors the equivalent fix already merged in daqifi-python-core PR #104
(execute_text_command + _text_exchange_lock + RLock + disposed/disconnecting
guard + post-acquisition validation).
895 tests pass on net9.0 + net10.0 (was 890 baseline).
Review Summary by QodoSerialize ExecuteTextCommandAsync with SemaphoreSlim to prevent race conditions
WalkthroughsDescription• Adds SemaphoreSlim lock to serialize concurrent ExecuteTextCommandAsync calls • Detects same-thread re-entrant calls and throws InvalidOperationException • Moves all validation inside lock to close TOCTOU window • Rejects calls during disposed/disconnecting state • Adds 5 comprehensive tests covering lock behavior and edge cases Diagramflowchart LR
A["Concurrent callers"] -->|"acquire lock"| B["SemaphoreSlim"]
B -->|"check re-entrancy"| C["Same thread?"]
C -->|"yes"| D["Throw InvalidOperationException"]
C -->|"no"| E["Check disposed/disconnecting"]
E -->|"yes"| D
E -->|"no"| F["Execute text command"]
F -->|"finally"| G["Release lock & clear owner"]
File Changes1. src/Daqifi.Core.Tests/Device/DaqifiDeviceTextCommandLockTests.cs
|
Code Review by Qodo
Context used 1.
|
…c concurrency Three concurrency findings from Qodo on the lock-coordination work: 1. Re-entrancy guard could miss across async hops. The previous design captured Environment.CurrentManagedThreadId before await, but after ConfigureAwait(false) the resumed thread may differ — a setupAction callback re-entering ExecuteTextCommandAsync could then deadlock on _textExchangeLock.WaitAsync() instead of throwing. Replaced the int? owner-thread tracker with AsyncLocal<bool>, which flows through async resumptions regardless of which thread picks them up. 2. Disconnect() did not coordinate with the text-exchange lock. An in-flight ExecuteTextCommandAsync could be mid-swap (text consumer running, protobuf consumer not yet restarted) when Disconnect ripped the transport out from under it. Disconnect now waits up to 5s to acquire _textExchangeLock before tearing down. If the wait times out, teardown proceeds anyway — a stuck text exchange must not block disconnect forever. 3. Dispose() disposed _textExchangeLock while an in-flight call could still hold it; the in-flight call's finally Release() would then throw ObjectDisposedException and mask the real exception. Wrapped the Release in a try/catch so disposal-during-flight is treated as a benign teardown signal. Tests updated to flip the AsyncLocal flag instead of poking the removed int? field. 895/897 pass (2 skipped require live hardware).
|
/improve |
|
/agentic_review |
PR Code Suggestions ✨Latest suggestions up to 2f23625 Warning
Previous suggestionsSuggestions up to commit 7dc53d3
Suggestions up to commit 328e212
|
|||||||||||||||||||||||||||
…tion Qodo flagged the symmetric race to the post-acquisition disposal handling: _textExchangeLock.WaitAsync() itself can throw ObjectDisposedException if Dispose() ran ahead of an arriving caller. Without a catch, the low-level teardown exception leaks to callers instead of the InvalidOperationException they get from the post-acquisition _disposed check. Wrap the WaitAsync and translate to the same "device is disposed" failure mode for consistency.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit f7efc6c |
PR Code Suggestions ✨Warning
No code suggestions found for the PR. |
…, ConfigureAwait, fix comment Two findings from Qodo review on the prior pass: 1. Disconnect blocks up to 5s. Reduced the SemaphoreSlim.Wait timeout from 5 to 1 second. The 1s budget covers the longest delay any normal text exchange can hold the lock for (responseTimeoutMs default + safety margin), without making sync Disconnect calls stall callers for an unreasonable wait. Callers wanting truly non-blocking disconnect should drive it from Task.Run. Combined with this: the two Task.Delay(50) calls inside ExecuteTextCommandAsync now use ConfigureAwait(false). Without it, the awaits resume on the captured sync context — and a sync Disconnect() called on that same UI thread would deadlock waiting for a lock the exchange can't release because its continuation can't reach the finally. 2. Misleading lock release comment. The previous comment said "We do NOT release this lock", which contradicted the finally block that does release on lockAcquired. Rewrote the comment to describe the actual contract.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 7dc53d3 |
|
Persistent suggestions updated to latest commit 7dc53d3 |
…ncy guard Qodo flagged a real deadlock: UpdateWifiModuleAsync holds _operationLock while synchronously firing progress.Report. If a handler calls CheckWifiFirmwareStatusAsync from inside that callback, WaitAsync would block waiting for the lock its own caller's flow already holds — SemaphoreSlim is not re-entrant. Added AsyncLocal<bool> _isInsideOperation flag, set true by both lock acquisition sites (CheckWifiFirmwareStatusAsync and RunExclusiveAsync). CheckWifiFirmwareStatusAsync now skips the lock acquisition when the flag is already true on the current async flow — safe because the outer flow already provides exclusive device-I/O access. AsyncLocal flows through await resumptions on different threads, so this catches re-entrancy even when the inner call resumes on a different thread than the outer (mirrors the AsyncLocal pattern from PR #196's ExecuteTextCommandAsync hardening). Regression test added: invokes CheckWifiFirmwareStatusAsync from inside a progress.Report callback during UpdateWifiModuleAsync. Without the guard the test hangs on the deadlocked semaphore; with the guard the inner call returns immediately and the update completes normally.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 7dc53d3 |
PR Code Suggestions ✨Warning
No code suggestions found for the PR. |
The 1-second wait was shorter than the worst-case ExecuteTextCommandAsync hold time (StopSafely up to 1s + maxWait of responseTimeoutMs*5 = 5s by default, longer with custom timeouts), so Disconnect/Dispose could rip the transport out from under an in-flight text exchange — reintroducing the race the lock is meant to prevent. 10s budget covers the default-timeout case with margin and most custom callers; on timeout the in-flight exchange still bails cleanly via the post-acquisition _isDisconnecting check, so teardown stays bounded.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 2f23625 |
|
Persistent suggestions updated to latest commit 2f23625 |
Summary
Adds
SemaphoreSlimaroundExecuteTextCommandAsyncso concurrent callers no longer race the protobuf-consumer pause/swap/restart sequence on the same stream. Also closes the TOCTOU window between the existing pre-lock validation and consumer manipulation, detects same-thread re-entrant calls explicitly, and rejects calls during disposed/disconnecting state.Implementation
_textExchangeLock = new SemaphoreSlim(1, 1)mutex. Disposed inDispose()._textExchangeOwnerThreadIdtracks the holding thread. Same-thread re-entry (asetupActionrecursing intoExecuteTextCommandAsync) is detected BEFOREWaitAsync()and throwsInvalidOperationException("not re-entrant")instead of deadlocking.Disconnect()/Dispose()duringWaitAsync()no longer leaves a stale_transport/_messageConsumerreference (TOCTOU close)._disposed || _isDisconnectingguard rejects calls during teardown.Mirrors daqifi/daqifi-python-core PR #104.
Test plan
DaqifiDeviceTextCommandLockTests(re-entrancy guard,_isDisconnecting,_disposed, lock released after validation failure, owner cleared after exception). Reflection-based — guards run before transport interaction so this is faithful coverage without a transport stack.Refs