test: sync injected qsigshare in llmq signing test#7304
test: sync injected qsigshare in llmq signing test#7304thepastaclaw wants to merge 1 commit intodashpay:developfrom
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
|
✅ Review complete (commit d21e0a7) |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe functional test for LLMQ signing was modified to change how a QSIGSHARE is delivered to the recovery member: the test now uses Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Single-line test fix adding sync_with_ping() after a direct qsigshare injection in feature_llmq_signing.py to eliminate a documented race against wait_for_sigs. Both reviewers converged that the change is minimal and correct. Only a stylistic nitpick remains.
Reviewed commit: 7c4f221
💬 1 nitpick(s)
|
Thanks — I checked #7301. It adds |
7c4f221 to
f670321
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/functional/feature_llmq_signing.py (1)
59-60: ⚡ Quick winConsider scaling the
wait_for_sigstimeout bytimeout_factorto fully eliminate CI-load flakiness.
send_and_pingguarantees the recovery member has received the share at the P2P layer, but the actual BLS share aggregation into a recovered signature happens asynchronously in a background thread. Under heavy CI multi-process load the 15-second ceiling at line 111 can still expire before recovery completes — exactly the scenario described in issue#7255.The standard mitigation (also recommended in the original issue) is to multiply by
self.options.timeout_factor:♻️ Suggested change
def wait_for_sigs(hasrecsigs, isconflicting1, isconflicting2, timeout): - self.wait_until(lambda: check_sigs(hasrecsigs, isconflicting1, isconflicting2), timeout = timeout) + self.wait_until(lambda: check_sigs(hasrecsigs, isconflicting1, isconflicting2), timeout = timeout * self.options.timeout_factor)This covers every
wait_for_sigscall site (lines 66, 111, 167, 171, 177, 201) in one change. Theassert_sigs_nochangecalls also pass a raw timeout toself.wait_until, but since those intentionally assert no change they should arguably stay fixed to avoid slow-running no-op waits.Also applies to: 111-111
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/functional/feature_llmq_signing.py` around lines 59 - 60, The wait_for_sigs helper currently calls self.wait_until(..., timeout=timeout) with a raw timeout which can flake under CI; update the wait_for_sigs definition to scale the passed timeout by self.options.timeout_factor (i.e. call self.wait_until(..., timeout=timeout * self.options.timeout_factor)) so all wait_for_sigs call sites (the ones at lines referenced in the review) automatically get the scaled timeout; do not change the assert_sigs_nochange raw timeout behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/functional/feature_llmq_signing.py`:
- Around line 59-60: The wait_for_sigs helper currently calls
self.wait_until(..., timeout=timeout) with a raw timeout which can flake under
CI; update the wait_for_sigs definition to scale the passed timeout by
self.options.timeout_factor (i.e. call self.wait_until(..., timeout=timeout *
self.options.timeout_factor)) so all wait_for_sigs call sites (the ones at lines
referenced in the review) automatically get the scaled timeout; do not change
the assert_sigs_nochange raw timeout behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c0d4e4b5-b0ad-4e2f-a290-12cb6c5b0eca
📒 Files selected for processing (1)
test/functional/feature_llmq_signing.py
f670321 to
1dda821
Compare
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Single-line functional-test synchronization fix at the exact reviewed commit f670321. Replacing send_message() with send_and_ping() for the injected qsigshare forces the recovery member to process that direct P2P message before the recovered-signature wait continues, without changing production code or the test's assertions. I found no correctness, security, or Dash-specific interaction issues in the reviewed change.
Reviewed commit: f670321
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The current head is a small, mechanically correct test-stability change for feature_llmq_signing.py --spork21: it converts RPC sig-share results into P2P QSIGSHARE payloads and flushes the direct send with send_and_ping(). The new full-threshold direct injection removes the recovered-signature assertion's previous dependence on the async submit=true relay path, which is a reasonable flake fix but leaves that positive relay behavior without coverage in this test.
Reviewed commit: 1dda821
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `test/functional/feature_llmq_signing.py`:
- [SUGGESTION] lines 99-115: Direct injection no longer positively covers the submit=true relay path
This block now rebuilds shares for members 0, 1, and 2 with `submit=false` and injects all of them directly into the selected recovery member. In this test `set_dash_llmq_test_params(5, 3)` overrides `LLMQ_TEST` to a 3-of-5 quorum, so those three injected shares are sufficient by themselves; the recovered-signature assertion at line 120 can pass even if the earlier default `submit=true` calls at lines 79 and 87 stop relaying shares under SPORK_21. Avoiding that async dependency is the right way to deflake this specific check, but it also means the test no longer has a positive assertion that submit=true shares are relayed and consumed. If that behavior is meant to stay covered here, add a separate deterministic assertion for the submit=true relay path rather than relying on this recovery step.
| # 3. Sending enough sig shares received from RPC to the recovery member through P2P interface, should | ||
| # result in a recovered sig. Build all threshold shares explicitly so this test does not depend on the | ||
| # asynchronous submit=true shares above being relayed before the timeout expires. | ||
| sig_shares = [ | ||
| rpc_sig_share_to_p2p(self.mninfo[i].get_node(self).quorum("sign", q_type, id, msgHash, quorumHash, False)) | ||
| for i in range(2) | ||
| ] | ||
| sig_shares.append(rpc_sig_share_to_p2p(sig_share_rpc_1)) | ||
| for mn in self.mninfo: # type: MasternodeInfo | ||
| assert mn.get_node(self).getconnectioncount() == self.llmq_size | ||
| # Get the current recovery member of the quorum | ||
| q = self.nodes[0].quorum('selectquorum', q_type, id) | ||
| mn: MasternodeInfo = self.get_mninfo(q['recoveryMembers'][0]) | ||
| # Open a P2P connection to it | ||
| p2p_interface = mn.get_node(self).add_p2p_connection(P2PInterface()) | ||
| # Send the last required QSIGSHARE message to the recovery member | ||
| p2p_interface.send_message(msg_qsigshare([sig_share])) | ||
| # Send the required QSIGSHARE messages to the recovery member | ||
| p2p_interface.send_and_ping(msg_qsigshare(sig_shares)) |
There was a problem hiding this comment.
🟡 Suggestion: Direct injection no longer positively covers the submit=true relay path
This block now rebuilds shares for members 0, 1, and 2 with submit=false and injects all of them directly into the selected recovery member. In this test set_dash_llmq_test_params(5, 3) overrides LLMQ_TEST to a 3-of-5 quorum, so those three injected shares are sufficient by themselves; the recovered-signature assertion at line 120 can pass even if the earlier default submit=true calls at lines 79 and 87 stop relaying shares under SPORK_21. Avoiding that async dependency is the right way to deflake this specific check, but it also means the test no longer has a positive assertion that submit=true shares are relayed and consumed. If that behavior is meant to stay covered here, add a separate deterministic assertion for the submit=true relay path rather than relying on this recovery step.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `test/functional/feature_llmq_signing.py`:
- [SUGGESTION] lines 99-115: Direct injection no longer positively covers the submit=true relay path
This block now rebuilds shares for members 0, 1, and 2 with `submit=false` and injects all of them directly into the selected recovery member. In this test `set_dash_llmq_test_params(5, 3)` overrides `LLMQ_TEST` to a 3-of-5 quorum, so those three injected shares are sufficient by themselves; the recovered-signature assertion at line 120 can pass even if the earlier default `submit=true` calls at lines 79 and 87 stop relaying shares under SPORK_21. Avoiding that async dependency is the right way to deflake this specific check, but it also means the test no longer has a positive assertion that submit=true shares are relayed and consumed. If that behavior is meant to stay covered here, add a separate deterministic assertion for the submit=true relay path rather than relying on this recovery step.
|
Checked the CodeRabbit timeout suggestion. No code change needed here: |
1dda821 to
d21e0a7
Compare
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Test-only change to feature_llmq_signing.py that hardens the spork21 LLMQ signing path. The split into a dedicated submit_true subtest plus explicit threshold-share construction with send_and_ping addresses a real flake risk. One minor coverage observation: the new submit_true subtest only asserts hasrecsig on the recovery member, not on all quorum members.
Reviewed commit: d21e0a7
💬 1 nitpick(s)
| # Sign a distinct request through the default submit=true path to assert sig share relay and recovery. | ||
| q_submit_true = self.nodes[0].quorum('selectquorum', q_type, submit_true_id) | ||
| submit_true_recovery_member = self.get_mninfo(q_submit_true['recoveryMembers'][0]) | ||
| submit_true_signers = [mn for mn in self.mninfo if mn != submit_true_recovery_member][:3] | ||
| for mn in submit_true_signers: | ||
| mn.get_node(self).quorum("sign", q_type, submit_true_id, submit_true_msgHash) | ||
| wait_for_recsig(submit_true_recovery_member, submit_true_id, submit_true_msgHash, 15) |
There was a problem hiding this comment.
💬 Nitpick: submit=true subtest only asserts recovery on the recovery member, not all quorum members
The new submit_true_id/submit_true_msgHash flow waits only for hasrecsig on submit_true_recovery_member. Prior to this PR, the implicit submit=true assertion was the all-nodes wait_for_sigs(True, False, True, 15) at line 136. After the split, that final wait still covers all-nodes recsig propagation, but only for the P2P-injected id/msgHash path. The recsig announcement code path is the same regardless of how recovery was triggered, so this is a minor coverage delta rather than a real regression risk — but extending the assertion to all masternodes is cheap and preserves the original guarantee. Note the suggested fix in the suggestion field; you'd want to re-derive submit_true_signers similarly when applying it.
💡 Suggested change
| # Sign a distinct request through the default submit=true path to assert sig share relay and recovery. | |
| q_submit_true = self.nodes[0].quorum('selectquorum', q_type, submit_true_id) | |
| submit_true_recovery_member = self.get_mninfo(q_submit_true['recoveryMembers'][0]) | |
| submit_true_signers = [mn for mn in self.mninfo if mn != submit_true_recovery_member][:3] | |
| for mn in submit_true_signers: | |
| mn.get_node(self).quorum("sign", q_type, submit_true_id, submit_true_msgHash) | |
| wait_for_recsig(submit_true_recovery_member, submit_true_id, submit_true_msgHash, 15) | |
| # Sign a distinct request through the default submit=true path to assert sig share relay and recovery | |
| # propagates to every quorum member. | |
| q_submit_true = self.nodes[0].quorum('selectquorum', q_type, submit_true_id) | |
| submit_true_recovery_member = self.get_mninfo(q_submit_true['recoveryMembers'][0]) | |
| submit_true_signers = [mn for mn in self.mninfo if mn != submit_true_recovery_member][:3] | |
| for mn in submit_true_signers: | |
| mn.get_node(self).quorum("sign", q_type, submit_true_id, submit_true_msgHash) | |
| self.wait_until( | |
| lambda: all( | |
| mn.get_node(self).quorum("hasrecsig", q_type, submit_true_id, submit_true_msgHash) | |
| for mn in self.mninfo | |
| ), | |
| timeout=15, | |
| ) |
source: ['codex']
|
Closing this as obsolete. The intermittent feature_llmq_signing.py --spork21 issue should be handled by the QSENDRECSIGS / quorum-connection fix path already in develop (e.g. c1cdb75 from #7289, and the related #7301 test coverage), not by hardening this individual assertion with extra direct-injected shares. |
Summary
qsigshareinjection infeature_llmq_signing.py --spork21withsend_and_ping().This avoids depending on the earlier async
submit=trueshares being relayedbefore the recovered-signature timeout.
Fixes #7255.
Validation
Passed locally on macOS arm64:
Repeated-run stress on macOS arm64 using the local build and this worktree's
test source:
Result: 8/10 passed under local
-j10resource contention. This reduces theobserved local failure rate from the previously recorded 6/10 baseline on
developto 2/10.