Skip to content

Control connection: discounted host-down signal can leave defunct connection unreconnected #847

@dkropachev

Description

@dkropachev

A defunct control connection can be left unreconnected when the corresponding host-down signal is discounted because regular session pools are still open.

Problem

ControlConnection._signal_error() relies on Cluster.signal_connection_failure() to trigger reconnect handling when the current control connection becomes defunct.

Cluster.signal_connection_failure() returns the result of host.signal_connection_failure(), which means “the conviction policy considered the host down”. That is not the same as “the cluster actually ran down handling”.

When _discount_down_events is enabled and session pools still have open connections to the host, Cluster.on_down() intentionally returns early to avoid marking the host down. In that case no control-connection reconnect is scheduled, but _signal_error() may still stop because the conviction policy returned true.

The result is a defunct control connection with the host still considered up and no manual reconnect.

Expected behavior

  • signal_connection_failure() should report whether down handling actually ran.
  • _signal_error() should manually reconnect when host-down handling was deferred or discounted.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions