Skip to content

Conversation

@ZhouXing19
Copy link
Collaborator

@ZhouXing19 ZhouXing19 commented Oct 27, 2025

Informs: #150015

This PR introduce 2 key configurations for the Canary Statistics Rollout feature. Note that this PR just to introduce the configuration settings. The core implementation for canary stats rollout will be in #156385.

Storage parameter canary_window (duration)

CREATE TABLE t (x int) WITH (canary_window = '20s')

This duration value determines specifies how long the newly collected statistics will be eligible for selection along
with the most recent full statistics for the optimizer. It is needed for the canary statistics rollout feature. Only tables with a non-zero canary window will have canary statistics rollout enabled.

This setting is capped at a duration of 48 hours or less to prevent excessively long canary windows.

Release note (sql change): A new table storage parameter canary_window has been introduced to enable gradual rollout of newly collected table statistics. It takes a duration string as the value, capped with max duration 48 hours. When set with a non-negative duration, the new statistics remain in a "canary" state for the specified duration before being promoted to stable. This allows for controlled exposure and intervention opportunities before statistics are fully deployed across all queries.


Cluster setting sql.stats.canary_fraction (float in [0 - 1])

SET CLUSTER SETTING sql.stats.canary_fraction = 0.2

This canaryFraction controls the probabilistic sampling rate for queries participating in the canary statistics rollout feature.
It determines what fraction of queries will use "canary statistics" (newly collected stats within their canary window) versus "stable statistics" (previously proven stats).

For example, a value of 0.2 means 20% of queries will test canary stats while 80% use stable stats.
The selection is atomic per query: if a query is chosen for canary evaluation, it will use canary statistics for ALL tables it references (where available). A query never uses a mix of canary and stable statistics.

Release note (sql change): introduce the cluster setting sql.stats.canary_fraction which takes a float number within range [0, 1]. Its value determines what fraction of queries will use "canary statistics" (newly collected stats within their canary window) versus "stable statistics" (previously proven stats). For example, a value of 0.2 means 20% of queries will test canary stats while 80% use stable stats. The selection is atomic per query: if a query is chosen for canary evaluation, it will use canary statistics for ALL tables it references (where available). A query never uses a mix of canary and stable
statistics.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

This commit introduces a new table-level storage parameter
`canary_window` that can be specified in CREATE TABLE or ALTER TABLE
statements. The canary window specifies how long newly collected
statistics will remain in "canary" state before being promoted to
stable, enabling gradual rollout of new statistics to mitigate
performance regressions.

When set to a non-zero duration, this parameter enables canary
statistics rollout for the table. During the canary window, the cluster
setting sql.stats.canary_fraction (introduced in the next commit)
determines what percentage of queries use the new canary statistics
versus the previous stable statistics, providing a buffer period for
observation and intervention.

The window is capped by 48 hours to avoid an outrageously long canary
window.

Note that this commit adds only syntax support and storage for the
parameter. The actual canary statistics selection logic will be
implemented in subsequent commits.

Release note (sql change): A new table storage parameter `canary_window`
has been introduced to enable gradual rollout of newly collected table
statistics. It takes a duration string as the value, with maximum
allowed duration 48 hours. When set with a non-negative duration, the
new statistics remain in a "canary" state for the specified duration
before being promoted to stable. This allows for controlled exposure
and intervention opportunities before statistics are fully deployed
across all queries.
See release note for details. Note that this cluster setting doesn't
apply to internal queries.

Release note (sql change): introduce the cluster setting `sql.stats.canary_fraction`
which takes a float number within range [0, 1]. Its value determines
what fraction of queries will use "canary statistics" (newly collected
stats within their canary window) versus "stable statistics"
(previously proven stats). For example, a value of 0.2 means 20% of
queries will test canary stats while 80% use stable stats. The
selection is atomic per query: if a query is chosen for canary
evaluation, it will use canary statistics for ALL tables it references
(where available). A query never uses a mix of canary and stable
statistics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants