Skip to content

Prevent libev watchers running on closed socket#903

Open
Lorak-mmk wants to merge 3 commits into
scylladb:masterfrom
Lorak-mmk:fix-libev
Open

Prevent libev watchers running on closed socket#903
Lorak-mmk wants to merge 3 commits into
scylladb:masterfrom
Lorak-mmk:fix-libev

Conversation

@Lorak-mmk

Copy link
Copy Markdown

The way I understand Connection semantics, close can be called from any thread at any time.
It may happen when handle_write / handle_read are already running. If close closes the socket, and one of watchers uses it, we may get EBADF.
Apart from this specific error, it just seems conceptually weird that resources operating on socket (watchers) are closed later than the socket itself.
The solution for this issue that I implemented here is quite simple: close the TCP socket only after watchers are stopped.

close calls connection_destroyed, which registers connection to be destroyed. Some time later, _loop_will_run is called which goes trough all connections that are registered to be destroyed, and stops their watchers. I moved socket close from close to after watchers are stopped for the connection in _loop_will_run.

Additionally, I took the early returns from PRs of @vponomaryov and @fruch .
I'm not 100% convinced they are necessary for correctness, but:

  • If connection is closed, it makes sense to avoid additional work
  • If both watchers are scheduled in a single loop iteration, and first one closes connection, then there is no point in executing the second one.

I decided to not take @vponomaryov change that sets last_error when connection is gracefully closed by the server. After a brief look at other reactors, it looks like its an established convention that graceful close does not set last_error - only Twisted does not abide by this.

I did pick up another optimization, in slightly changed form: factory now checks if connection is closed without last_error, and raises if so.
This is not necessary for correctness at all imo, because the connection may get closed just after being returned from factory, so we are not preventing any scenarios. It may however be an optimization in some cases, because we'll learn quicker that connection is dead.

Fixes: #614 (hopefully)

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

`close` can be called from anywhere, not only reactor threads.
If such `close` call closes socket during `handle_write` /
`handle_read`, then those functions may try to operate on closed socket.

Solution implemented in this commit: defer socket closing until both
watchers are stopped.
Previous commit defered socket close until watchers are stopped, but
there is one more case worth considering.
If during one libev loop iteration socket gets ready for both read and
write, then both watchers will be called. If one decides to close the
connection, the other one will still get called anyway.
This shouldn't cause EBADF, because socket won't be closed yet, but I
see no reason to perform unnecessary work.
When connection is closed by the server, but there is no other error, it
will be close (is_cloes == True) without setting `last_error`. This is
true for all reactors apart from Twisted as far as I can tell.

If we try to use such connection, we'll quickly discover that its
broken, but we can slightly optimize this process by raising directly
from factory().
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 792b7aa6-3c56-4318-9fbb-4ef32eb95729

📥 Commits

Reviewing files that changed from the base of the PR and between bf7966f and 7a9211f.

📒 Files selected for processing (2)
  • cassandra/connection.py
  • cassandra/io/libevreactor.py

📝 Walkthrough

Walkthrough

This PR addresses socket lifecycle management and server-initiated connection closure handling in the Python driver. The connection factory now detects when the server closes a connection after the connection event completes and raises ConnectionShutdown explicitly. LibevConnection handlers check for closed connections early and return immediately to prevent stale processing. Socket cleanup responsibility shifts from LibevConnection.close() to the libev reactor's cleanup and closed-connection handling paths, which now explicitly close sockets and log debug messages during shutdown.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective: preventing libev watchers from operating on closed sockets.
Description check ✅ Passed The PR description is detailed, explains the race condition, implementation approach, design decisions, and includes Fixes annotation. However, no new tests were added per author's note.
Linked Issues check ✅ Passed The changes directly address issue #614 by eliminating the EBADF race condition through deferring socket closure until after watchers stop.
Out of Scope Changes check ✅ Passed All changes are within scope: socket closure deferral, watcher early returns, factory optimization, and debug logging for socket cleanup.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Driver reported "[Errno 9] Bad file descriptor"

1 participant