connmgr: don't send keep-alives during handoff #48072
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Whenever proxy or handler want to hand off a session, they send a
handoff-start
message to connmgr, and connmgr replies withhandoff-proceed
. After this, connmgr is supposed to stay silent until it receives another message, at which point it assumes whoever sent the message is the new owner of the session and communication can resume as normal. However, in rare cases connmgr might send keep-alive messages to the old owner while it is waiting to hear from the new owner, leading to broken sessions and "received message out of sequence" warnings. This PR fixes connmgr to not send keep-alives during handoff.There are two fixes:
accept_handoff()
, for both server and client mode, unset theto_addr
(i.e. the known peer address) of the session. Keep-alives are only sent for sessions with a known peer address. This is an easy one line fix per mode.keep_alives_task()
, for both server and client mode, account for the possibility thatto_addr
might get unset for sessions during batch preparation and processing. This is a bit more complicated, see below.The
keep_alives_task()
tries to address keep-alive messages to multiple sessions at once, to reduce the number of messages sent. It does this by processing sessions in batches. It first decides which sessions it will be sending keep-alives for, and then it proceeds to generate and send the minimal number of messages with optimal packing. Notably, it waits for message sendability before generating and sending each message, and we need to tolerateto_addr
getting unset for any sessions during this waiting. We do this by making it possible forBatch::take_group()
to skip sessions via itsget_id
closure argument and updating the closure provided bynext_batch_message()
to skip sessions that don't have ato_addr
.The
Batch
type, including tests, is redundantly implemented twice (once inconnmgr::client
and once inconnmgr::server
). This PR updates both copies of the code. In the future we should consider sharing a single implementation ofBatch
to avoid having to make redundant updates.