Skip to content

Support explicit local queries to zero-token contact points #845

@dkropachev

Description

@dkropachev

Problem

scylladb/scylladb has tests that start zero-token Scylla nodes (join_ring: false) and then use the Python driver to perform a direct readiness query against the newly started node.

The relevant test is:

The test adds zero-token nodes, for example:

await manager.server_add(config=zero_token_cfg, property_file=get_pf("dc1", "rz"))

manager.server_add() waits for the default CQL_QUERIED state. That readiness path creates a driver Cluster with the zero-token node as the explicit contact point and a whitelist policy for that same address, then runs a local query:

Cluster(
    contact_points=[zero_token_node_rpc_address],
    execution_profiles={
        EXEC_PROFILE_DEFAULT: ExecutionProfile(
            load_balancing_policy=WhiteListRoundRobinPolicy([zero_token_node_rpc_address])
        )
    },
)

session.execute("SELECT key FROM system.local where key = 'local'")

That code lives in scylladb/scylladb/test/pylib/scylla_cluster.py::ScyllaServer.get_cql_up_state().

Current behavior

With scylla-driver==3.29.9, the driver can connect to the zero-token node as a contact point/control connection target, but the node is excluded from the usable query host set because system.local.tokens is None.

As a result, the Scylla test harness cannot schedule even the explicitly targeted local readiness query to that contact point, and server_add() times out while adding the zero-token node. The Scylla node itself is already serving CQL.

This shows up in the Scylla test above before the keyspace assertions are reached.

Older behavior

The same Scylla test passed with scylla-driver==3.29.7.

That appears to be because older driver initialization pre-added explicit contact points as Host objects before metadata refresh. The zero-token contact point therefore remained available to the whitelist load-balancing policy for the direct system.local readiness query.

Requested behavior

The driver should provide a supported way to run explicitly targeted/local queries against a zero-token contact point.

Important distinction: this should not make zero-token nodes normal application-routing targets.

Expected behavior:

  • zero-token nodes discovered through peers/topology metadata should remain excluded from normal routing and token-aware query plans
  • zero-token nodes should not own token ranges
  • an explicitly supplied zero-token contact point should still be usable for direct/local queries such as SELECT key FROM system.local WHERE key = 'local'

This is needed by scylladb/scylladb test infrastructure to verify startup/readiness of zero-token nodes without treating them as normal replicas or token-owning query targets.

Reproducer

From scylladb/scylladb, run the specific test with scylla-driver==3.29.9:

python ./test.py \
  --mode dev \
  --tmpdir testlog/driver-zero-token-local-query \
  --jobs 1 \
  --cluster-pool-size 1 \
  --max-failures 1 \
  --timeout 210 \
  --session-timeout 600 \
  --extra-scylla-cmdline-options "--critical-disk-utilization-level 1.0" \
  "cluster/test_keyspace_rf::test_create_keyspace_with_default_replication_factor[False-True]"

The failure occurs while adding the zero-token node and waiting for CQL_QUERIED.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions