A defunct control connection can be left unreconnected when the corresponding host-down signal is discounted because regular session pools are still open.
Problem
ControlConnection._signal_error() relies on Cluster.signal_connection_failure() to trigger reconnect handling when the current control connection becomes defunct.
Cluster.signal_connection_failure() returns the result of host.signal_connection_failure(), which means “the conviction policy considered the host down”. That is not the same as “the cluster actually ran down handling”.
When _discount_down_events is enabled and session pools still have open connections to the host, Cluster.on_down() intentionally returns early to avoid marking the host down. In that case no control-connection reconnect is scheduled, but _signal_error() may still stop because the conviction policy returned true.
The result is a defunct control connection with the host still considered up and no manual reconnect.
Expected behavior
signal_connection_failure() should report whether down handling actually ran.
_signal_error() should manually reconnect when host-down handling was deferred or discounted.
A defunct control connection can be left unreconnected when the corresponding host-down signal is discounted because regular session pools are still open.
Problem
ControlConnection._signal_error()relies onCluster.signal_connection_failure()to trigger reconnect handling when the current control connection becomes defunct.Cluster.signal_connection_failure()returns the result ofhost.signal_connection_failure(), which means “the conviction policy considered the host down”. That is not the same as “the cluster actually ran down handling”.When
_discount_down_eventsis enabled and session pools still have open connections to the host,Cluster.on_down()intentionally returns early to avoid marking the host down. In that case no control-connection reconnect is scheduled, but_signal_error()may still stop because the conviction policy returned true.The result is a defunct control connection with the host still considered up and no manual reconnect.
Expected behavior
signal_connection_failure()should report whether down handling actually ran._signal_error()should manually reconnect when host-down handling was deferred or discounted.