feat(firmware): retry LAN chip-info probe during startup transients (closes #144)#199
Conversation
…loses #144) Right after a PIC32 firmware update the application is up while the WiFi subsystem is still finishing startup, so the first GetLanChipInfo query can transiently fail. Without retry, the WiFi version decision short-circuits to ChipInfoUnavailable and the desktop layer ends up running an unnecessary multi-minute reflash of already-current firmware. Desktop's workaround was a bounded retry loop in the app — this PR moves that policy into Core where the WiFi-update decision itself lives. Adds two FirmwareUpdateServiceOptions properties: - LanChipInfoMaxAttempts (default 3) — total tries before giving up - LanChipInfoRetryDelay (default 2s) — wait between attempts Worst-case wait = (MaxAttempts - 1) * RetryDelay = 4s by default, which fits the observed startup window without slowing steady state. The retry loop observes cancellation between attempts. Wired into CheckWifiFirmwareStatusCoreAsync so both the legacy IsWifiFirmwareUpToDateAsync hit-path AND the new (PR #198) CheckWifiFirmwareStatusAsync planning method get the retry behavior for free. Test plan: 3 new tests cover (a) transient failure recovery within budget, (b) exhausted budget falls through to ChipInfoUnavailable, (c) steady-state success doesn't trigger retry overhead. Extended FakeLanChipInfoStreamingDevice with transientFailuresBeforeSuccess + GetLanChipInfoCallCount instrumentation. 898/900 pass (2 skipped require live hardware). Stacks on PR #198 — needs to land first.
|
/improve |
|
/agentic_review |
Review Summary by QodoAdd retry policy for LAN chip-info probe during startup transients
WalkthroughsDescription• Add bounded retry policy for LAN chip-info probe during WiFi startup • Prevent unnecessary reflash of up-to-date WiFi firmware post-PIC32 reboot • Introduce LanChipInfoMaxAttempts and LanChipInfoRetryDelay options • Add comprehensive test coverage for retry behavior and edge cases Diagramflowchart LR
A["CheckWifiFirmwareStatusCoreAsync"] --> B["TryGetLanChipInfoWithRetryAsync"]
B --> C["Retry Loop"]
C --> D["GetLanChipInfoAsync"]
D --> E{Success?}
E -->|Yes| F["Return ChipInfo"]
E -->|No| G{Attempts Left?}
G -->|Yes| H["Delay & Retry"]
H --> D
G -->|No| I["Return null"]
I --> J["ChipInfoUnavailable"]
F --> K["Continue Status Check"]
File Changes1. src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs
|
Code Review by Qodo
Context used 1.
|
Code Review by Qodo
1.
|
PR Code Suggestions ✨Latest suggestions up to 01fd011 Warning
Previous suggestionsSuggestions up to commit 34c23d6
Suggestions up to commit 215cd84
|
|||||||||||||||||||||||||||||||||
…ll-clock time Qodo correctly flagged that per-attempt query timeouts compound with retry delays, so the actual worst-case blocking can be ~10s with the default DaqifiStreamingDevice 2s response timeout (3 attempts × 2s + 2 × 2s delays), not the 4s the docs claimed. Bad enough by itself, worse because the retry loop holds _operationLock the whole time. Added LanChipInfoTotalTimeout option (default 8s) — a hard wall-clock cap enforced by a linked CancellationTokenSource around the entire retry loop. Caller's cancellation token is honored as before; the new timeout just adds a deadline. When hit, the loop short-circuits to ChipInfoUnavailable instead of letting the per-attempt timeouts keep accumulating. Updated LanChipInfoMaxAttempts docstring to reflect the actual worst-case math (sum of attempt durations + delays) and point at LanChipInfoTotalTimeout for hard bounds. Test: SlowFakeLanChipInfoStreamingDevice with 200ms attempt latency + 100ms TotalTimeout asserts the probe bails in <1500ms instead of the 3s a naive impl would take with 10 max attempts.
|
/improve |
|
/agentic_review |
|
Persistent suggestions updated to latest commit 34c23d6 |
…ync throw Test fake's GetLanChipInfoAsync now returns Task.FromException for the simulated transient failure case, matching how a real async method surfaces errors. Behavior of the production retry loop is identical (both forms get caught by the same try/await), but the test now exercises the more honest async exception path.
|
/improve |
|
/agentic_review |
|
Persistent review updated to latest commit 01fd011 |
|
Persistent suggestions updated to latest commit 01fd011 |
Summary
Adds bounded retry policy for the LAN chip-info probe so post-PIC32-reboot WiFi-subsystem startup transients don't force an unnecessary multi-minute reflash of already-current WiFi firmware.
Right after a PIC32 update the application is up while the WiFi subsystem is still finishing startup — the first
GetLanChipInfoquery can transiently fail. Without retry, the WiFi version decision short-circuits toChipInfoUnavailableand Core flows on to flash. Desktop's existing workaround was a bounded retry loop in the app; this PR moves that policy into Core where the WiFi-update decision lives.API additions
Two new
FirmwareUpdateServiceOptionsproperties:LanChipInfoMaxAttempts(default 3) — total tries before giving upLanChipInfoRetryDelay(default 2s) — wait between attemptsWorst-case wait = (MaxAttempts - 1) × RetryDelay = 4s by default, which fits the observed startup window without slowing steady state. The retry loop observes cancellation between attempts.
Wired into
CheckWifiFirmwareStatusCoreAsyncso both the legacyIsWifiFirmwareUpToDateAsynchit-path AND the new (PR #198)CheckWifiFirmwareStatusAsyncplanning method get the retry behavior automatically.Stacked on PR #198
This branch is built on top of
feat/wifi-update-check-status— it depends on theWifiFirmwareStatusshape introduced there. Needs PR #198 to merge first; this PR's diff against main shows only the incremental changes once #198 lands.Test plan
FakeLanChipInfoStreamingDevicewithtransientFailuresBeforeSuccess+GetLanChipInfoCallCountinstrumentationCloses #144