Problem
scylladb/scylladb has tests that start zero-token Scylla nodes (join_ring: false) and then use the Python driver to perform a direct readiness query against the newly started node.
The relevant test is:
The test adds zero-token nodes, for example:
await manager.server_add(config=zero_token_cfg, property_file=get_pf("dc1", "rz"))
manager.server_add() waits for the default CQL_QUERIED state. That readiness path creates a driver Cluster with the zero-token node as the explicit contact point and a whitelist policy for that same address, then runs a local query:
Cluster(
contact_points=[zero_token_node_rpc_address],
execution_profiles={
EXEC_PROFILE_DEFAULT: ExecutionProfile(
load_balancing_policy=WhiteListRoundRobinPolicy([zero_token_node_rpc_address])
)
},
)
session.execute("SELECT key FROM system.local where key = 'local'")
That code lives in scylladb/scylladb/test/pylib/scylla_cluster.py::ScyllaServer.get_cql_up_state().
Current behavior
With scylla-driver==3.29.9, the driver can connect to the zero-token node as a contact point/control connection target, but the node is excluded from the usable query host set because system.local.tokens is None.
As a result, the Scylla test harness cannot schedule even the explicitly targeted local readiness query to that contact point, and server_add() times out while adding the zero-token node. The Scylla node itself is already serving CQL.
This shows up in the Scylla test above before the keyspace assertions are reached.
Older behavior
The same Scylla test passed with scylla-driver==3.29.7.
That appears to be because older driver initialization pre-added explicit contact points as Host objects before metadata refresh. The zero-token contact point therefore remained available to the whitelist load-balancing policy for the direct system.local readiness query.
Requested behavior
The driver should provide a supported way to run explicitly targeted/local queries against a zero-token contact point.
Important distinction: this should not make zero-token nodes normal application-routing targets.
Expected behavior:
- zero-token nodes discovered through peers/topology metadata should remain excluded from normal routing and token-aware query plans
- zero-token nodes should not own token ranges
- an explicitly supplied zero-token contact point should still be usable for direct/local queries such as
SELECT key FROM system.local WHERE key = 'local'
This is needed by scylladb/scylladb test infrastructure to verify startup/readiness of zero-token nodes without treating them as normal replicas or token-owning query targets.
Reproducer
From scylladb/scylladb, run the specific test with scylla-driver==3.29.9:
python ./test.py \
--mode dev \
--tmpdir testlog/driver-zero-token-local-query \
--jobs 1 \
--cluster-pool-size 1 \
--max-failures 1 \
--timeout 210 \
--session-timeout 600 \
--extra-scylla-cmdline-options "--critical-disk-utilization-level 1.0" \
"cluster/test_keyspace_rf::test_create_keyspace_with_default_replication_factor[False-True]"
The failure occurs while adding the zero-token node and waiting for CQL_QUERIED.
Problem
scylladb/scylladbhas tests that start zero-token Scylla nodes (join_ring: false) and then use the Python driver to perform a direct readiness query against the newly started node.The relevant test is:
scylladb/scylladb/test/cluster/test_keyspace_rf.py::test_create_keyspace_with_default_replication_factorThe test adds zero-token nodes, for example:
manager.server_add()waits for the defaultCQL_QUERIEDstate. That readiness path creates a driverClusterwith the zero-token node as the explicit contact point and a whitelist policy for that same address, then runs a local query:That code lives in
scylladb/scylladb/test/pylib/scylla_cluster.py::ScyllaServer.get_cql_up_state().Current behavior
With
scylla-driver==3.29.9, the driver can connect to the zero-token node as a contact point/control connection target, but the node is excluded from the usable query host set becausesystem.local.tokensisNone.As a result, the Scylla test harness cannot schedule even the explicitly targeted local readiness query to that contact point, and
server_add()times out while adding the zero-token node. The Scylla node itself is already serving CQL.This shows up in the Scylla test above before the keyspace assertions are reached.
Older behavior
The same Scylla test passed with
scylla-driver==3.29.7.That appears to be because older driver initialization pre-added explicit contact points as
Hostobjects before metadata refresh. The zero-token contact point therefore remained available to the whitelist load-balancing policy for the directsystem.localreadiness query.Requested behavior
The driver should provide a supported way to run explicitly targeted/local queries against a zero-token contact point.
Important distinction: this should not make zero-token nodes normal application-routing targets.
Expected behavior:
SELECT key FROM system.local WHERE key = 'local'This is needed by
scylladb/scylladbtest infrastructure to verify startup/readiness of zero-token nodes without treating them as normal replicas or token-owning query targets.Reproducer
From
scylladb/scylladb, run the specific test withscylla-driver==3.29.9:The failure occurs while adding the zero-token node and waiting for
CQL_QUERIED.