-
Notifications
You must be signed in to change notification settings - Fork 54
connection: clean up failed heartbeat sends #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1816,7 +1816,19 @@ def __init__(self, connection, owner): | |
| with connection.lock: | ||
| if connection.in_flight < connection.max_request_id: | ||
| connection.in_flight += 1 | ||
| connection.send_msg(OptionsMessage(), connection.get_request_id(), self._options_callback) | ||
| request_id = connection.get_request_id() | ||
| try: | ||
| connection.send_msg(OptionsMessage(), request_id, self._options_callback) | ||
| except Exception as exc: | ||
| if connection.is_control_connection: | ||
| connection.in_flight -= 1 | ||
| # send_msg() registers the callback before writing to the socket, | ||
| # so a write failure must unwind that registration here. | ||
| connection._requests.pop(request_id, None) | ||
| if request_id not in connection.request_ids: | ||
| connection.request_ids.append(request_id) | ||
| self._exception = exc | ||
| self._event.set() | ||
| else: | ||
|
Comment on lines
+1819
to
1832
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comment for the first commit. Assumption I have in the comment: if There is one more edge case: if we wail after
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks, fixed |
||
| self._exception = Exception("Failed to send heartbeat because connection 'in_flight' exceeds threshold") | ||
| self._event.set() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just don't understand the second commit. Why do you treat CC differently here? Why do we care about socket write error - won't it result in connection being closed anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC is special because ControlConnection.return_connection() does not decrement in_flight, while HostConnection.return_connection() does. The write failure still matters because send_msg() has already registered the callback and reserved the request id before push(), so that bookkeeping has to be unwound explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention
return_connectionbut I don't see where it is called :(Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the later owner.return_connection(connection) in ConnectionHeartbeat.run:
python-driver/cassandra/connection.py
Lines 1874 to 1927 in 6ad10b4
python-driver/cassandra/connection.py
Lines 1921 to 1927 in 6ad10b4
That block only unwinds the callback/request-id registration that
send_msg()already did:python-driver/cassandra/connection.py
Lines 1205 to 1231 in 6ad10b4
For control connections,
ControlConnection.return_connection()does not decrementin_flightpython-driver/cassandra/cluster.py
Lines 4267 to 4269 in 6ad10b4
while
HostConnection.return_connection()does, so the direct decrement has to stay here. It can’t be handled in ControlConnection.return_connection() because that method only sees a defunct/closed connection at the end of the heartbeat cyclepython-driver/cassandra/pool.py
Lines 542 to 547 in 6ad10b4
It does not know which request_id was reserved, and it does not have the context that send_msg() already registered the callback in _requests. The leak happens in the send_msg() failure path, while we still have the request_id and can unwind _requests / request_ids immediately.`