-
Notifications
You must be signed in to change notification settings - Fork 37
Refactoring the shutdown process to fix a payment count bug #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactoring the shutdown process to fix a payment count bug #235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken a high level look at this and it looks reasonable! Much smaller diff than I was expecting as well, which is nice.
I do (sadly) think we should bite the bullet and add some meta-tests that test all this shutdown business, so that we can square it away once and for all.
Things like:
- Spin up simulation with producers that finish their tasks, assert that we shut down
- Shutdown with track payment currently executing, and make sure that we exit
I imagine it's going to be a hell of a lot of mocking, but I think it'll really be worthwhile in the long term. Happy to collaborate on a POC test to figure out the best way to get some utils set up to share the burden a bit!
@carlaKC Absolutely agree that some tests should be added for this. Is there a test that requires mocking that would be good to look at as an example or would this be the first one of this kind? |
The mocks in |
For reference here is what is done in this PR:
The new shutdown flow looks like this:
|
Nice! Thanks for the thorough walkthrough.
Interested that we don't hit any unexpected errors that make us close out. Is that because we'll keep running even if we fail to send a payment (eg, one of the nodes has shut down)? |
Ok yes, good point. Another way the shutdown is triggered is D.) An error is thrown while setting up consumers and producers or while the producers and consumers are running. |
@carlaKC I made a first attempt at adding some shutdown tests. Let me know what you think. There is probably some cleanup to do and a few other test cases we could add but this is a starting point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a high level look at the tests and they look great, approach ACK
simln-lib/src/lib.rs
Outdated
@@ -1691,4 +1693,224 @@ mod tests { | |||
|
|||
assert!(result.is_ok()); | |||
} | |||
|
|||
#[tokio::test] | |||
async fn test_shutdown_timeout() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have #81, we'll be able to replace the clock with a test clock so that this can be much more precise.
I'm tempted to put this test off till then, had a lot of bad experiences with timing based tests falling apart on the potato that github runs CI on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wondered about that. Most of the shutdown tests have some kind of timing aspect... checking the runtime against expected runtime or sleeping for a few seconds before manually shutting down. Should we wait to do any of this until we figure out a better way to handle timing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about tokio advance?
let mut mock_node_2 = MockLightningNode::new(); | ||
|
||
// Set up node 1 expectations | ||
mock_node_1.expect_get_info().return_const(node_1.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Would be nice if we could pull some of this out into more generic helpers so that we can compose tests like this more easily.
Edit: you did this in the next commit, awesome. Happy for you to squash that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this PR has several merge commits in it where I updated from master. Would you prefer just squashing the whole PR into 1 commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to have multiple commits of changes, but I would like to get rid of the merge commits in the PR with a rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased to remove the merge commits. Cleaner looking git history now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
Would be nice to add some coverage for the cases where we have multiple activities, specifically:
- One with a
count
, one without -> make sure the second one keeps running - Two activities, one has a permanent error -> make sure both exit
Also think that we need some coverage for the track_payment
/timer stuff if possible, even if it's not end to end (I suspect this will be tricky to mock)
} | ||
} | ||
}; | ||
let output = output_receiver.recv().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above, "In the multiple-producer case, a single producer shutting down does not drop all sending channels so the consumer will not exit and a trigger is required" won't apply anymore so can be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean just remove that part of the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just remove the old comment that will no longer apply
simln-lib/src/lib.rs
Outdated
let (stop, listen) = triggered::trigger(); | ||
|
||
// Timer for waiting after getting the shutdown signal in order for current tracking to complete | ||
let mut timer: Option<tokio::time::Sleep> = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move tokio::time::Sleep
to top level imports
simln-lib/src/lib.rs
Outdated
}, | ||
|
||
// Trigger and listener to stop the implementation specific track payment functions (node.track_payment()) | ||
let (stop, listen) = triggered::trigger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename track_payment_trigger/track_payment_listener
so this is a little more readable.
Some(_) = conditional_sleeper(timer) => { | ||
log::error!("Track payment failed for {}. The shutdown timer expired.", hex::encode(hash.0)); | ||
stop.trigger(); | ||
timer = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set this to None
and our select
is biased to always select the top case, don't we hit a loop here?
listener.is_triggered() && timer.is_none() = T
timer = (sleep 3 seconds)
conditional_sleeper = T
(after 3 seconds)stop.trigger()
timer=None
listener.clone().is_triggered && timer.is_none() = T
- Loop repeats
I think that we can remove biased
here (we don't need it) and not set timer=None
- even without biased
the select
will randomly choose a branch that's ready, so we could end up looping a few times.
Also think that it would be nice to make a note about this in dev docs to explain a shorter version of what we've discussed in the issue (happy for that to be a followup or do it myself once this is in). |
f55644b
to
e031ddd
Compare
Added two more tests for these cases. I will have to think through the timer test case some more, that will be tricky. |
Summary of remaining tasks for this PR:
|
Closes #222