feat(firmware): wait for application readiness after PIC32 reconnect (closes #145)#200
feat(firmware): wait for application readiness after PIC32 reconnect (closes #145)#200cptkoolbeenz wants to merge 13 commits into
Conversation
…loses #145) Adds an opt-in readiness probe to FirmwareUpdateService so a PIC32 firmware update doesn't transition to Complete until the device is actually ready to answer normal application commands. The serial transport re-enumerates well before the application firmware is up; without this wait, downstream flows (LAN chip-info queries, WiFi prep) hit a half-started device and either fail or have to reimplement their own retry loop in the calling app — exactly the pattern desktop had to work around. API additions on FirmwareUpdateServiceOptions: - PostReconnectReadinessProbe — Func<IStreamingDevice, CancellationToken, Task<bool>>?. Returns true when the application is responsive. Null disables the wait (legacy behavior). - PostReconnectReadinessTimeout (default 30s) — wall-clock budget - PostReconnectReadinessRetryDelay (default 500ms) — between probe attempts When the timeout elapses without the probe returning true, the update transitions to Failed with a clear TimeoutException wrapped in FirmwareUpdateException — NOT a silent Complete on a half-ready device. Test plan: 3 new tests cover (a) probe succeeds on attempt N >1 holding back Complete until ready, (b) timeout raises a properly wrapped FirmwareUpdateException with FailedState=JumpingToApp and the readiness keyword in the inner message, (c) null probe preserves legacy fast-complete path. Full suite 893/895 (2 skipped require live hardware).
|
/improve |
|
/agentic_review |
Review Summary by QodoAdd post-reconnect application readiness probe for PIC32 firmware updates
WalkthroughsDescription• Add optional post-reconnect readiness probe to FirmwareUpdateService • Probe waits for PIC32 application firmware readiness before completing update • Timeout transitions update to Failed state instead of silent completion • Preserve legacy behavior when probe is null (opt-in feature) Diagramflowchart LR
A["Serial Reconnect"] --> B{"Readiness Probe<br/>Configured?"}
B -->|Yes| C["Poll Probe<br/>with Retry"]
B -->|No| D["Complete<br/>Legacy Path"]
C -->|Success| E["Complete"]
C -->|Timeout| F["Failed<br/>with TimeoutException"]
File Changes1. src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs
|
Code Review by Qodo
Context used 1. Readiness probe is optional
|
PR Code Suggestions ✨Latest suggestions up to 26b19fe Warning
Previous suggestionsSuggestions up to commit ea82fd1
✅ Suggestions up to commit 5ac58c0
Suggestions up to commit 5ac58c0
✅ Suggestions up to commit 881f65a
✅ Suggestions up to commit 7dc8bdf
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
…ll-behaved probes Defensive cancellation re-check after the await: a probe that ignores its CancellationToken could otherwise return true after the timeout elapsed and slip past the budget. The post-await check forces the TimeoutException path even for that case. Caller-cancellation semantics preserved: the inner try/catch only reinterprets the timeout-CT case; OperationCanceledException from the caller-CT propagates out unchanged via the outer catch.
|
/improve |
|
/agentic_review |
|
Persistent suggestions updated to latest commit 7dc8bdf |
…sion in timeout messages
Replaced {totalTimeout.TotalSeconds:F0}s with {totalTimeout} across
all 4 readiness-probe TimeoutException messages. The :F0 formatter
rounded sub-second timeouts to "0s" — including the test value of
150ms — which was unhelpful for diagnostics. {TimeSpan} formats as
hh:mm:ss.fffffff and preserves precision.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit cff6e8f |
|
Persistent suggestions updated to latest commit cff6e8f |
…t enforcement Replaced the post-await cancellation re-check from pass 2 with Task.WaitAsync(linkedToken) on the probe call itself. WaitAsync is the stronger primitive: when the timeout fires, it throws OperationCanceledException immediately instead of waiting for the rogue probe to return. The post-await re-check was defensive but still required the probe to complete; WaitAsync short-circuits. Skipped a redundant inline option-validation suggestion (already covered by Validate() in FirmwareUpdateServiceOptions).
| if (_options.PostReconnectReadinessProbe is { } probe) | ||
| { | ||
| await WaitForApplicationReadyAsync(device, probe, cancellationToken).ConfigureAwait(false); | ||
| } |
There was a problem hiding this comment.
1. Readiness probe is optional 📎 Requirement gap ☼ Reliability
FirmwareUpdateService only waits for application readiness when PostReconnectReadinessProbe is set; otherwise it proceeds immediately after serial reconnect, which can still return a half-started device. This keeps the failure mode the checklist is trying to eliminate and can still force downstream integrations to add their own readiness polling.
Agent Prompt
## Issue description
The post-reconnect readiness wait is currently opt-in (`PostReconnectReadinessProbe` is nullable and gated), so the firmware update flow can still proceed immediately after reconnect without confirming the application is ready.
## Issue Context
Checklist IDs 1 and 2 require Core’s firmware update lifecycle to handle post-reconnect application readiness so reconnect completion isn’t based solely on transport reopening, and downstream callers don’t need their own readiness loops.
## Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[810-820]
- src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[124-150]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
/agentic_review |
|
Persistent review updated to latest commit eb28b8d |
…t cause in TimeoutException Qodo flagged that deterministic probe failures lose their root cause: the catch swallows the probe exception, retries, and eventually throws TimeoutException without the underlying error attached. With Debug logs off the real cause is invisible. Now captures the most-recent probe exception and attaches it as InnerException to all 3 timeout-path TimeoutExceptions. Steady-state behavior unchanged; observability improves for the failure case. Skipped 3 other findings: - "Readiness probe is optional" — defensible design per the issue's explicit "callers can provide a readiness probe" wording; making it always-on requires Core to know what "ready" means, which it doesn't. - "Conflicting timeout budgets" — readiness timeout (30s default) fits inside JumpingToApplicationTimeout (45s default) by design; documented in the existing options docstrings. - "Readiness wait not surfaced" — observability nice-to-have; out of scope for this fix.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 32e6f91 |
|
Persistent suggestions updated to latest commit 32e6f91 |
…licationTimeout interaction (PR #200) Address Qodo finding 'Conflicting timeout budgets': the readiness probe's budget runs inside the JumpingToApp state, so users who raise the readiness timeout near/above JumpingToApplicationTimeout will see the outer state-timeout fire first. Default values (45s state, 30s readiness) leave headroom for reconnect; documented the interaction explicitly so configuration mistakes are visible at the call site. Skipping the remaining 2 findings as design choices: - 'Readiness probe is optional' — the issue's own wording specifies caller-provided probe; making it always-on would require Core to know what 'ready' means for an arbitrary device. - 'Readiness wait not surfaced' — observability nice-to-have; out of scope for this fix.
|
Convergence summary (Qodo /agentic_review pass 5, after commit 89657ba): Persistent findings + dispositions:
Test gate: 893/895 (2 hardware skips). CI green. Ready for review. |
|
/improve |
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 5ac58c0 |
|
Persistent suggestions updated to latest commit 5ac58c0 |
|
Persistent /agentic_review items — accepted as design decisions, not blockers:
Re-running Qodo to confirm no new findings beyond these two known-acknowledged items. |
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 5ac58c0 |
|
Persistent suggestions updated to latest commit 5ac58c0 |
Wrap PostReconnectReadinessTimeout / RetryDelay positive checks AND the cross-property constraint (Timeout < JumpingToApplicationTimeout) in the 'probe is configured' guard. Callers that don't supply a probe now never have to think about these timeouts — symmetric with the no-op runtime behavior in WaitForApplicationReadyAsync.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit ea82fd1 |
|
Persistent suggestions updated to latest commit ea82fd1 |
When the probe throws on attempt N then later attempts return false until the readiness budget expires, the TimeoutException carried the stale exception from attempt N as InnerException — misleading debugging and any handler that inspects InnerException. Clear lastProbeException whenever a probe invocation completes normally (true OR false), so a subsequent timeout only attaches a probe-thrown exception when it's actually the most recent outcome. Locked in with a focused test.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit b089763 |
PR Code Suggestions ✨Warning
No code suggestions found for the PR. |
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit b089763 |
PR Code Suggestions ✨Warning
No code suggestions found for the PR. |
|
/improve |
PR Code Suggestions ✨Warning
No code suggestions found for the PR. |
Add LogInformation at WaitForApplicationReadyAsync entry and on success (with elapsed time + attempt count) so observers tailing the log can distinguish a deliberate readiness poll from a stuck flow. Wait can take up to PostReconnectReadinessTimeout (default 30s); previously only Debug-level breadcrumbs existed during that window.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 26b19fe |
|
Persistent suggestions updated to latest commit 26b19fe |
Summary
Adds an opt-in readiness probe so a PIC32 firmware update doesn't transition to
Completeuntil the device is actually ready to answer normal application commands. Serial transport re-enumeration succeeds well before the application firmware is up; without this wait, downstream flows (LAN chip-info queries, WiFi prep) hit a half-started device — exactly the pattern desktop had to work around with its own retry loop.API additions
FirmwareUpdateServiceOptionsgains 3 new properties:PostReconnectReadinessProbe—Func<IStreamingDevice, CancellationToken, Task<bool>>?. Returnstruewhen the application is responsive. Null disables the wait (legacy behavior preserved).PostReconnectReadinessTimeout(default 30s) — wall-clock budget for the wait.PostReconnectReadinessRetryDelay(default 500ms) — delay between probe attempts.When the timeout elapses without the probe returning true, the update transitions to
Failedwith aTimeoutExceptionwrapped inFirmwareUpdateException— NOT a silentCompleteon a half-ready device, which is the entire point of #145.Test plan
Completeuntil ready (assertCompletestate + probe call count = 3)FirmwareUpdateExceptionwithFailedState=JumpingToAppand the readiness keyword in the inner messageCloses #145