LWT: run non-replica coordinator on the same shard as a replica by gusev-p · Pull Request #2 · gusev-p/scylla

gusev-p · 2025-06-25T10:46:47Z

Currently, sp::cas_shard uses shard_for_reads to decide which shard to use for LWT execution—both on replicas and the coordinator.

If the coordinator is not a replica, shard_for_reads returns a default shard (shard 0). There are at least two problems with this:

shard 0 can become overloaded, because all LWT coordinators-but-not-replacas
are served on it.
mismatch with replicas: the default shard doesn't match what shard_for_reads
returns on replicas. This means RPC level optimization (same shard for client
and server) won't work.

In this PR we change sp::cas_shard to use a primary replica shard if the current node is not a replica. This guarantees that all LWT coordinators for the same tablet will be served on the same shard. This is important for LWT coordinator locks (paxos::paxos_state::get_cas_lock). Also, if all tablet replicas on different nodes live on the same shard, RPC optimization will make sure that no additional smp::submit_to will be needed on the server side.

Fixes scylladb/scylladb#20497

backport: not needed, since this change is important for LWT over tablets which is not released yet

gleb-cloudius · 2025-06-25T10:53:03Z

If the coordinator is not a replica, shard_for_reads returns a default shard (shard 0) is this the case for tablets? For vnodes it was suppose to be working properly IIRC.

gusev-p · 2025-06-25T10:57:41Z

If the coordinator is not a replica, shard_for_reads returns a default shard (shard 0) is this the case for tablets? For vnodes it was suppose to be working properly IIRC.

Yes, for v-nodes shar_for_reads is always defined as return dht::shard_of(_shard_count, _sharding_ignore_msb_bits, t);, for tablets it defaults to zero:

// FIXME: Consider throwing when there is no owning shard on the current host rather than returning 0.
// It's a coordination mistake to route requests to non-owners. Topology coordinator should synchronize
// with request coordinators before moving the shard away.
auto shard = shard_for_reads(tid, _host).value_or(0);

Add helpers for base64url encoding. base64url is a variant of base64 that uses a URL-safe alphabet. It can be constructed from base64 by replacing the '+' and '/' characters with '-' and '_' respectively. Many implementations also strip the padding, although this is not required by the spec [1]. This will be used in upcoming patches for Azure Key Vault requests that require base64url-encoded payloads. [1] https://datatracker.ietf.org/doc/html/rfc4648#section-5 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Azure authentication is token based - the client obtains an access token with their credentials, and uses it as a bearer token to authorize requests to Azure services. Define a common API for all credential types. The API will consist of a single `get_access_token()` function that will be returning a new or a cached access token for some resource URI (defines token scope). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

The goal is to mimic the Azure C++ SDK, which offers a variety of credentials, depending on their type and source. Declare the following credentials: * Service Principal credentials * Managed Identity credentials * Azure CLI credentials * Default credentials Also, define a common exception for SP and MI credentials which are network-based. This patch only defines the API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

The rest http client, currently used by the AWS and GCP key providers, logs the HTTP requests and responses unaltered. This causes some sensitive data to be exposed (plaintext data encryption keys, credentials, access tokens). Add an interface to optionally redact any sensitive data from HTTP headers and payloads. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Implement token request for Service Principals with a secret. The token request requires a TLS connection. When closing the connection, do not wait for a response to the TLS `close_notify` alert. Azure's OAuth server would ignore it and the Seastar `connected_socket` would hang for 10 seconds. Add log redaction logic to not expose sensitive data from the request and response payloads. Add a token factory to parse the HTTP response. This cannot be shared with other credential types because the JSON format is not consistent. Finally, implement a fail-fast retry policy for short-lived transient errors. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Implement token request for Service Principals with a certificate. The request is the same as with a secret, except that the secret is replaced with an assertion. The assertion is a JWT that is signed with the certificate. To be consistent with the Azure C++ SDK, expect the certificate and the associated private key to be encoded in PEM format and be provided in a single file. The docs suggest using 'PS256' for the JWT's 'alg' claim. Since this is not supported by our current JWT library (jwt-cpp), use 'RS256' instead. The JWT also requires a unique identifier for the 'jti' claim. Use a random UUID for that (it should suffice for our use cases). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Implement token request from IMDS. No credentials are required for that - just a plain HTTP request on the IMDS token endpoint. Since the IMDS endpoint is a raw IP, it's not possible to reliably determine whether IMDS is accessible or not (i.e., whether the node is an Azure VM). Azure provides no node-local indication either. In lack of a better choice, attempt to connect and declare failure if the connection is not established within 3 seconds. Use a raw TCP socket for this check, as the HTTP client currently lacks timeout or cancellation support. Perform the check only once, during the first token refresh. For the time being, do not support nodes with multiple user-assigned managed identities. Expect the token request to fail in this case (IMDS requires the identifier of the desired Managed Identity). Add a token factory to correctly parse the HTTP response. This addresses a discrepancy between token requests on IMDS and Azure Entra - the 'expires_in' field is a string in the former and an integer in the latter. Finally, implement a fail-fast retry policy for short-lived transient errors. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Implement token request with Azure CLI. Inspired from the Azure C++ SDK's `AzureCliCredential`, this credential type attempts to run the Azure CLI in a shell and parse the token from its output. This is meant for development purposes, where a user has already installed the Azure CLI and logged in with their user account. Pass the following environment to the process: * PATH * HOME * AZURE_CONFIG_DIR Add a token factory to construct a token from the process output. Unlike in Azure Entra and IMDS, the CLI's JSON output does not contain 'expires_in', and the token key is in camel case. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Attempt to detect credentials from the system. Inspired from the `DefaultAzureCredential` in the Azure C++ SDK, this credential type detects credentials from the following sources (in this order): * environment variables (SP credentials - same variables as in Azure C++ SDK) * Azure CLI * IMDS Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

For deduplication. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

The Azure host manages cryptographic keys using Azure Key Vault. This patch only defines the API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Add a cache to store data encryption keys based on their attributes (cipher algorithm + key length). This will be plugged into `get_or_create_key()` in a later patch to reuse the same keys in multiple requests, thereby reducing the API calls to Key Vault. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

The Azure host needs credentials to communicate with Key Vault. First search for credentials in the host options, and then fall back to default credentials if the former are non-existent or incomplete. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

In later patches, we will prepare the reader for inexact index implementations (ones which can return a Data file range that includes some partitions before or after the queried range). For that, we will need to filter out the partitions outside of the range, and for that we need to remember the range. This is the goal of this patch. Note that we are storing a reference to an argument of `fast_forward_to`. This is okay, because the contract of `mutation_reader` specifies that the caller must keep `pr` alive until the next `fast_forward_to` or until the reader is destroyed.

The current index format is exact: it always returns the position of the first partition in the queried partition range. But we are about the add an index format where that doesn't have to be the case. In BTI indexes, the lookup can be off by one partition sometimes. This patch prepares the reader for that, by skipping the partitions which were read by the data reader but don't belong to the queried range. Note: as of this patch, only the "normal path" is ever used. We add tests exercising these code paths later. Also note that, as of this patch, actually stepping outside the queried range would cause the reader to end up in a state where the underlying parser is positioned right after partition key immediately following the queried range. If the reader was forwarded to that key in this state, it would trip an assert, because the parser can't handle backward jumps. We will add logic to handle this case in the next patch.

A bunch of code assumes that the Data.db stream can only go forward. But with BTI indexes, if we perform an advance_to, the index can point to a position which the data reader has already passed, since the index is inexact. The logic of the data reader ensures that it has stopped within the last partition range, or just immediately after it, after reading the next partition key and noticing that it doesn't belong to the range. But forward_to can only be used with increasing ranges. The start of the next range must be greater or equal to the end of the previous range. This means that the exact start of the next partition range must be no earlier than: 1. Before the partition key just read by the data reader, if the data reader is positioned immediately after a partition key. 2. The start of the first partition after the current data reader position, if the data reader isn't positioned immediately after a partition key. So, if the index returns a position smaller than the current data reader position, then: 1. If the reader is immediately after a partition key, we have to reuse this partition key (since we can't go back in the stream to read it again), and keep reading from the current position. 2. Otherwise we can safely walk the index to the first partition that lies no earlier than the current position.

…ptional BTI indexes only store encoded prefixes of partition keys, not the whole keys. They can't reliably implement `get_partition_key`. The index reader interface must be weakened and callers must be adapted.

…ion()` `advance_to_next_partition()` needs an ability to advance the index to the partition immediately following the reader's current partition. For this, it uses `abstract_index_reader::advance_to(dht::ring_position_view)` But BTI (and any index format which stores only the prefixes of keys instead of whole keys) can't implement `advance_to` with its current semantics. The Data position returned by the index for a generic `advance_to` might be off by one partition. E.g. if the index stores prefixes `a`, `b`, `c`, the index has no way to know if the first entry after `bb` is `b` (which might correspond to `ba` as well as `bc`), or `c`. However, BTI can be used exactly if the partition is known to be present in the sstable. (In the above example, if `bb` is known to be present in the sstable, then it must correspond to `b`. So the index can reliably advance to `bb` or the first partition after it). And this is enough for `advance_to_next_partition()`, because the current partition is known to be present. So we can replace the usage of `advance_to` with an equivalent API call which only works with present keys, but in exchange is implementable by BTI.

`advance_context()` needs an ability to advance the index to the partition immediately following the reader's current partition. For this, it uses `abstract_index_reader::advance_to(dht::ring_position_view)` But BTI (and any index format which stores only the prefixes of keys instead of whole keys) can't implement `advance_to` with its current semantics. The Data position returned by the index for a generic `advance_to` might be off by one partition. E.g. if the index stores prefixes `a`, `b`, `c`, the index has no way to know if the first entry after `bb` is `b` (which might correspond to `ba` as well as `bc`), or `c`. However, BTI can be used exactly if the partition is known to be present in the sstable. (In the above example, if `bb` is known to be present in the sstable, then it must correspond to `b`. So the index can reliably advance to `bb` or the first partition after it). And this is enough for `advance_context()`, because the current partition is known to be present. So we can replace the usage of `advance_to` with an equivalent API call which only works with present keys, but in exchange is implementable by BTI. This makes `advance_to` unused, so we remove it.

`advance_to` is unused now, so remove it.

…uctor For tests. Will be used for testing how the data reader reacts to various combinations of inexact index lookup results.

…ndex_reader After making the sstable reader more permissive, we can weaken the abstract_index_reader interface.

The init.hh contains some bits that only main.cc needs. Some of its forward declarations are neede by neither the headers itself, nor the main.cc that includes it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb#25110

The latter header in included two times, one is enough Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb#25109

The view builder uses group0 operations to coordinate view building, so we should drain the view builder before stopping group0. Fixes scylladb#25096 Closes scylladb#25101

We improve logging in critical functions in hinted handoff to capture more information about the behavior of the module. That should help us in debugging sessions. The logs should only be printed during more important events and so they should not clog the log files. Backport: not necessary. Closes scylladb#25031 * github.com:scylladb/scylladb: db/hints/manager.cc: Add logs for changing host filter db/hints: Increase log level in critical functions

…ksandra Martyniuk A tablet repair started with /storage_service/repair_async/ API bypasses tablet repair scheduler and repairs only the tablets that are owned by the requested node. Due to that, to safely repair the whole keyspace, we need to first disable tablet migrations and then start repair on all nodes. With the new API - /storage_service/tablets/repair - tailored to tablet repair requirements, we do not need additional preparation before repair. We may request it on one node in a cluster only and, thanks to tablet repair scheduler, a whole keyspace will be safely repaired. Both nodetool and Scylla Manager have already started using the new API to repair tablets. Refuse repairing tablet keyspaces with /storage_service/repair_async - 403 Forbidden is returned. repair_async should still be used to repair vnode keyspaces. Fixes: scylladb#23008. Breaking change; no backport. Closes scylladb#24678 * github.com:scylladb/scylladb: repair: remove unused code api: repair_async: forbid repairing tablet keyspaces

The `token_metadata_impl` stores the sorted tokens in an `std::vector`. With a large number of nodes, the size of this vector can grow quickly, and updating it might lead to oversized allocations. This commit changes `_sorted_tokens` to a `chunked_vector` to avoid such issues. It also updates all related code to use `chunked_vector` instead of `std::vector`. Fixes scylladb#24876 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb#25027

…levy As they are wasteful in many cases, it is better to move the tablet_map if possible, or clone it gently in an async fiber. Add clone() and clone_gently() methods to allow explicit copies. * minor optimization, no backport needed Closes scylladb#24978 * github.com:scylladb/scylladb: tablets: prevent accidental copy of tablets_map locator: tablets: get rid of synchronous mutate_tablet_map

…m Michał Chojnowski Unlike the currently-used sstable index files, BTI indexes don't store the entire partition keys. They only store prefixes of decorated keys, up to the minimum length needed to differentiate a key from its neighbours in the sstable. This saves space. However, it means that a BTI index query might be off by one partition (on each end of the queried partition range) with respect to the optimal Data position. For example, if the index stores prefixes `a`, `b`, `c`, the index has no way to know if the first index entry after key `bb` is `b` (which might correspond to `ba` as well as `bc`), or `c`. So the index reader conservatively has to pick the wider Data range, and the Data reader must ignore the superfluous partitions. (And there's no way around that.) Before this patch, the sstable reader expects the index query to return an exact (optimal) Data range. This patch adjusts the logic of the sstable reader to allow for inexact ranges. Note: the patch is more complicated that it looks. The logic of the sstable reader was already fairly hard to follow and this adds even more flags, more weird special states and more edge cases. I think I managed to write a decent test and it did find three or four edge cases I wouldn't have noticed otherwise. I think it should cover all the added logic, but I didn't verify code coverage. (Do our scripts for that even work nowadays)? Simplification ideas are welcome. Preparation for new functionality, no backporting needed. Closes scylladb#25093 * github.com:scylladb/scylladb: sstables/index_reader: weaken some exactness guarantees in abstract_index_reader test/boost: add a test for inexact index lookups sstables/mx/reader: allow passing a custom index reader to the constructor sstables/index_reader: remove advance_to sstables/mx/reader: handle inexact lookups in `advance_context()` sstables/mx/reader: handle inexact lookups in `advance_to_next_partition()` sstables/index_reader: make the return value of `get_partition_key` optional sstables/mx/reader: handle "backward jumps" in forward_to sstables/mx/reader: filter out partitions outside the queried range sstables/mx/reader: update _pr after `fast_forward_to`

Whilst the coredump script checks for prerequisites, the user experience is not ideal because you either have to go in the script and get the list of deps and install them or wait for the script to complain about lacking dependencies one by one. This commit completes the list of dependencies in the install script (some of them were already there for Fedora), so you already have them installed by the time you get to run the coredump script. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> [avi: - remove trailing whitespace - regenerate frozen toolchain Optimized clang binaries generated and stored in https://devpkg.scylladb.com/clang/clang-20.1.8-Fedora-42-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-20.1.8-Fedora-42-x86_64.tar.gz ] Closes scylladb#22369 Closes scylladb#25203

Otherwise, tablet rebuilt will be delayed for up to 60s, as the tablet scheduler needs load stats for the new node (replacing) to make decisisons. Fixes scylladb#25163 Closes scylladb#25181

…ction Renamed the file and updated all references from 'enbleautocompaction' to the correct 'enableautocompaction'. Fixes scylladb#25172 Closes scylladb#25175

This PR enables **LWT (Lightweight Transactions)** support for tablet-based tables by leveraging **colocated tables**. Currently, storing Paxos state in system tables causes two major issues: * **Loss of Paxos state during tablet migration or base table rebuilds** * When a tablet is migrated or the base table is rebuilt, system tables don't retain Paxos state. * This breaks LWT correctness in certain scenarios. * Failing test cases demonstrating this: * test_lwt_state_is_preserved_on_tablet_migration * test_lwt_state_is_preserved_on_rebuild * **Shard misalignment and performance overhead** * Tablets may be placed on arbitrary shards by the tablet balancer. * Accessing Paxos state in system tables could require a shard jump, degrading performance. We move Paxos state into a dedicated Paxos table, colocated with the base table: * Each base table gets its own Paxos state table. * This table is lazily created on the first LWT operation. * Its tablets are colocated with those of the base table, ensuring: * Co-migration during tablet movement * Co-rebuilding with the base table * Shard alignment for local access to Paxos state Some reasoning for why this is sufficient to preserve LWT correctness is discussed in [2]. This PR addresses two issues from the "Why doesn't it work for tablets" section in [1]: * Tablet migration vs LWT correctness * Paxos table sharding Other issues ("bounce to shard" and "locking for intranode_migration") have already been resolved in previous PRs. References [1] - [LWT over tablets design](https://docs.google.com/document/d/1CPm0N9XFUcZ8zILpTkfP5O4EtlwGsXg_TU4-1m7dTuM/edit?tab=t.0#heading=h.goufx7gx24yu) [2] - [LWT: Paxos state and tablet balancer](https://docs.google.com/document/d/1-xubDo612GGgguc0khCj5ukmMGgLGCLWLIeG6GtHTY4/edit?tab=t.0) [3] - [Colocated tables PR](scylladb#22906 (comment)) [4] - [Possible LWT consistency violations after a topology change](scylladb#5251) Backport: not needed because this is a new feature. Closes scylladb#24819 * github.com:scylladb/scylladb: create_keyspace: fix warning for tablets docs: fix lwt.rst docs: fix tablets.rst alternator: enable LWT random_failures: enable execute_lwt_transaction test_tablets_lwt: add test_paxos_state_table_permissions test_tablets_lwt: add test_lwt_for_tablets_is_not_supported_without_raft test_tablets_lwt: test timeout creating paxos state table test_tablets_lwt: add test_lwt_concurrent_base_table_recreation test_tablets_lwt: add test_lwt_state_is_preserved_on_rebuild test_tablets_lwt: migrate test_lwt_support_with_tablets test_tablets_lwt: add test_lwt_state_is_preserved_on_tablet_migration test_tablets_lwt: add simple test for LWT check_internal_table_permissions: handle Paxos state tables client_state: extract check_internal_table_permissions paxos_store: handle base table removal database: get_base_table_for_tablet_colocation: handle paxos state table paxos_state: use node_local_only mode to access paxos state query_options: add node_local_only mode storage_proxy: handle node_local_only in query storage_proxy: handle node_local_only in mutate storage_proxy: introduce node_local_only flag abstract_replication_strategy: remove unused using storage_proxy: add coordinator_mutate_options storage_proxy: rename create_write_response_handler -> make_write_response_handler storage_proxy: simplify mutate_prepare paxos_state: lazily create paxos state table migration_manager: add timeout to start_group0_operation and announce paxos_store: use non-internal queries qp: make make_internal_options public paxos_store: conditional cf_id filter paxos_store: coroutinize feature_service: add LWT_WITH_TABLETS feature paxos_state: inline system_keyspace functions into paxos_store paxos_state: extract state access functions into paxos_store

Currently, we use storage_proxy/get_cas_shard -> sharder.shard_for_reads to decide which shard to use for LWT code execution on both replicas and the coordinator. If the coordinator is not a replica, shard_for_reads returns 0 — the 'default' shard. This behavior has at least two problems: * Shard 0 may become overloaded, because all LWT coordinators that are not replicas will be served on it. * The zero shard does not match shard_for_reads on replicas, which hinders the "same shard for client and server" RPC-level optimization. To fix this, we need to know whether the current node hosts a replica for the tablet corresponding to the given token. Currently, there is no API we could use for this. For historical reasons, sharder::shard_for_reads returns 0 when the node does not host the shard, which leads to ambiguity. This commit introduces try_get_shard_for_reads, which returns a disengaged std::optional when the tablet is not present on the local node. We leave shard_for_reads method in the base sharder class, it calls try_get_shard_for_reads and returns zero by default. We need to rename tablet_sharder private methods shard_for_reads and shard_for_writes so that they don't conflict with the sharder::shard_for_reads.

Currently, get_cas_shard uses shard_for_reads to decide which shard to use for LWT execution—both on replicas and the coordinator. If the coordinator is not a replica, shard_for_reads returns a default shard (shard 0). There are at least two problems with this: * shard 0 can become overloaded, because all LWT coordinators-but-not-replacas are served on it. * mismatch with replicas: the default shard doesn't match what shard_for_reads returns on replicas. This hinders the "same shard for client and server" RPC level optimization. In this commit we change get_cas_shard to use a primary replica shard if the current node is not a replica. This guarantees that all LWT coordinators for the same tablet will be served on the same shard. This is important for LWT coordinator locks (paxos::paxos_state::get_cas_lock). Also, if all tablet replicas on different nodes live on the same shard, RPC optimization will make sure that no additional smp::submit_to will be needed on the server side. Fixes scylladb#20497

Check that an LWT coordinator which is not a replica runs on the same shard as a replica.

This is a refactoring commit. Remove the rf_rack_valid_keyspaces: False flag because rf_rack_validy is going to become mundatory in scylladb#23526

gleb-cloudius approved these changes Jun 25, 2025

View reviewed changes

gusev-p force-pushed the paxos-state-colocated-tables branch 7 times, most recently from 7188b92 to cd862c8 Compare July 4, 2025 16:12

gusev-p force-pushed the paxos-state-colocated-tables branch 6 times, most recently from 826d6cc to 8b54ae1 Compare July 15, 2025 11:23

ndragazis added 14 commits July 16, 2025 17:14

encryption: gcp: Fix typo in docstring

b1e719c

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

encryption: Templatize get_{kmip,kms,gcp}_host()

e078abb

For deduplication. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

encryption: azure: Add skeleton for Azure host

b39d1b1

The Azure host manages cryptographic keys using Azure Key Vault. This patch only defines the API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

michoecho and others added 22 commits July 25, 2025 10:49

sstables/index_reader: make the return value of get_partition_key o…

141895f

…ptional BTI indexes only store encoded prefixes of partition keys, not the whole keys. They can't reliably implement `get_partition_key`. The index reader interface must be weakened and callers must be adapted.

sstables/index_reader: remove advance_to

fe8ee34

`advance_to` is unused now, so remove it.

sstables/mx/reader: allow passing a custom index reader to the constr…

810eb93

…uctor For tests. Will be used for testing how the data reader reacts to various combinations of inexact index lookup results.

test/boost: add a test for inexact index lookups

be1f54c

sstables/index_reader: weaken some exactness guarantees in abstract_i…

b1da5f2

…ndex_reader After making the sstable reader more permissive, we can weaken the abstract_index_reader interface.

init.hh: Remove unused forward declarations

1b9eb4c

The init.hh contains some bits that only main.cc needs. Some of its forward declarations are neede by neither the headers itself, nor the main.cc that includes it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb#25110

sstables/types.hh: Remove duplicate version.hh inclusion

403a729

The latter header in included two times, one is enough Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb#25109

storage service: drain view builder before group0

3ff388c

The view builder uses group0 operations to coordinate view building, so we should drain the view builder before stopping group0. Fixes scylladb#25096 Closes scylladb#25101

topology_coordinator: Trigger load stats refresh after replace

55116ee

Otherwise, tablet rebuilt will be delayed for up to 60s, as the tablet scheduler needs load stats for the new node (replacing) to make decisisons. Fixes scylladb#25163 Closes scylladb#25181

docs: fix typo in command name enbleautocompaction -> enableautocompa…

6b6622e

…ction Renamed the file and updated all references from 'enbleautocompaction' to the correct 'enableautocompaction'. Fixes scylladb#25172 Closes scylladb#25175

gusev-p force-pushed the lwt-coordinator-shard-fix branch from 8e7b677 to 7e3eb09 Compare July 28, 2025 17:51

gusev-p force-pushed the lwt-coordinator-shard-fix branch from 7e3eb09 to 789c551 Compare July 29, 2025 09:41

gusev-p force-pushed the lwt-coordinator-shard-fix branch from 789c551 to 4d8a31e Compare July 29, 2025 15:11

gusev-p added 2 commits July 30, 2025 13:08

test_tablets_lwt: add test_lwt_coordinator_shard

bd82a9d

Check that an LWT coordinator which is not a replica runs on the same shard as a replica.

test_tablets_lwt.py: make tests rf_rack_valid

dea41b1

This is a refactoring commit. Remove the rf_rack_valid_keyspaces: False flag because rf_rack_validy is going to become mundatory in scylladb#23526

gusev-p force-pushed the lwt-coordinator-shard-fix branch from 4d8a31e to dea41b1 Compare July 30, 2025 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LWT: run non-replica coordinator on the same shard as a replica#2

LWT: run non-replica coordinator on the same shard as a replica#2
gusev-p wants to merge 66 commits into
paxos-state-colocated-tablesfrom
lwt-coordinator-shard-fix

gusev-p commented Jun 25, 2025

Uh oh!

gleb-cloudius commented Jun 25, 2025

Uh oh!

gusev-p commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Conversation

gusev-p commented Jun 25, 2025

Uh oh!

gleb-cloudius commented Jun 25, 2025

Uh oh!

gusev-p commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants