Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connmgr: don't send keep-alives during handoff #48072

Merged
merged 1 commit into from
Sep 30, 2024

Conversation

jkarneges
Copy link
Member

Whenever proxy or handler want to hand off a session, they send a handoff-start message to connmgr, and connmgr replies with handoff-proceed. After this, connmgr is supposed to stay silent until it receives another message, at which point it assumes whoever sent the message is the new owner of the session and communication can resume as normal. However, in rare cases connmgr might send keep-alive messages to the old owner while it is waiting to hear from the new owner, leading to broken sessions and "received message out of sequence" warnings. This PR fixes connmgr to not send keep-alives during handoff.

There are two fixes:

  1. In accept_handoff(), for both server and client mode, unset the to_addr (i.e. the known peer address) of the session. Keep-alives are only sent for sessions with a known peer address. This is an easy one line fix per mode.
  2. In keep_alives_task(), for both server and client mode, account for the possibility that to_addr might get unset for sessions during batch preparation and processing. This is a bit more complicated, see below.

The keep_alives_task() tries to address keep-alive messages to multiple sessions at once, to reduce the number of messages sent. It does this by processing sessions in batches. It first decides which sessions it will be sending keep-alives for, and then it proceeds to generate and send the minimal number of messages with optimal packing. Notably, it waits for message sendability before generating and sending each message, and we need to tolerate to_addr getting unset for any sessions during this waiting. We do this by making it possible for Batch::take_group() to skip sessions via its get_id closure argument and updating the closure provided by next_batch_message() to skip sessions that don't have a to_addr.

The Batch type, including tests, is redundantly implemented twice (once in connmgr::client and once in connmgr::server). This PR updates both copies of the code. In the future we should consider sharing a single implementation of Batch to avoid having to make redundant updates.

@jkarneges jkarneges force-pushed the jkarneges/quiet-during-handoff branch from 089682e to d4e951f Compare September 27, 2024 19:23
@jkarneges jkarneges force-pushed the jkarneges/quiet-during-handoff branch from d4e951f to 1c97f8e Compare September 27, 2024 19:45
Copy link
Contributor

@madeline-pratt madeline-pratt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :)

@jkarneges jkarneges merged commit fe718eb into main Sep 30, 2024
19 checks passed
@jkarneges jkarneges deleted the jkarneges/quiet-during-handoff branch September 30, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants