Skip to content

Conversation

@hfxsd
Copy link
Collaborator

@hfxsd hfxsd commented Nov 6, 2025

Introduces the 'Analyze Embedded in DDL' feature documentation and adds a new page describing its behavior for index creation and reorganization. Updates TOC to include the new doc and documents the 'tidb_stats_update_during_ddl' system variable, which controls this feature.

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Introduces the 'Analyze Embedded in DDL' feature documentation and adds a new page describing its behavior for index creation and reorganization. Updates TOC to include the new doc and documents the 'tidb_stats_update_during_ddl' system variable, which controls this feature.
@hfxsd hfxsd self-assigned this Nov 6, 2025
@ti-chi-bot ti-chi-bot bot added the contribution This PR is from a community contributor. label Nov 6, 2025
@hfxsd hfxsd requested a review from AilinKid November 6, 2025 02:53
@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 6, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @hfxsd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds new documentation for TiDB's 'Analyze Embedded in DDL' feature, which is designed to improve query plan stability and performance. By automatically collecting statistics for new or reorganized indexes during DDL operations, the feature ensures that the optimizer has accurate and up-to-date information immediately, preventing suboptimal query plans that could arise from missing statistics. The documentation details the feature's purpose, use cases, and how it's controlled by the new tidb_stats_update_during_ddl system variable.

Highlights

  • New Feature Documentation: Introduced comprehensive documentation for the 'Analyze Embedded in DDL' feature, which automatically collects statistics during index creation and reorganization.
  • New System Variable: Added documentation for the new system variable tidb_stats_update_during_ddl, which controls the DDL-embedded Analyze behavior.
  • Table of Contents Update: The Table of Contents (TOC) has been updated to include the new documentation page for 'Analyze Embedded in DDL'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@hfxsd hfxsd added translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. missing-translation-status This PR does not have translation status info. labels Nov 6, 2025
@hfxsd hfxsd added type/compatibility-or-feature-change This PR involves compatibility changes or feature behavior changes. and removed contribution This PR is from a community contributor. labels Nov 6, 2025
@hfxsd hfxsd changed the title Add docs for DDL-embedded Analyze and new system variable Add docs for Analyze Embedded in DDL and new system variable Nov 6, 2025
@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 6, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for the 'Analyze Embedded in DDL' feature and its corresponding system variable. The documentation is well-structured and informative. I've provided a few suggestions to correct some typos in the examples and to align the text with the repository's style guide, primarily concerning the formatting of command names.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 6, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hfxsd. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

AilinKid and others added 3 commits November 6, 2025 15:27
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 6, 2025

@AilinKid: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Collaborator

@qiancai qiancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM


When this feature is enabled, TiDB automatically runs an `ANALYZE` (statistics collection) operation before the new or reorganized index becomes visible to users. This prevents inaccurate optimizer estimates and potential plan changes caused by temporarily unavailable statistics after index creation or reorganization.

## Use scenarios
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Use scenarios
## Usage scenarios


When `tidb_stats_update_during_ddl` is `ON`, executing [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) automatically runs an embedded `ANALYZE` operation after the Reorg phase finishes. This `ANALYZE` operation collects statistics for the newly created index before the index becomes visible to users, and then `ADD INDEX` proceeds with its remaining phases.

Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` will stop waiting synchronously for `ANALYZE` to finish and will continue the subsequent process so that the index becomes visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` will stop waiting synchronously for `ANALYZE` to finish and will continue the subsequent process so that the index becomes visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.
Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.

When `tidb_stats_update_during_ddl` is `ON`, executing [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) or [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) that reorganizes an index will also run an embedded `ANALYZE` operation after the Reorg phase completes. The mechanism is the same as for `ADD INDEX`:

- Start collecting statistics before the index becomes visible.
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) will not synchronously wait for `ANALYZE` to finish and will continue so the index becomes visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) will not synchronously wait for `ANALYZE` to finish and will continue so the index becomes visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.

- Persists to cluster: Yes
- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No
- Default value: `OFF`
- This variable controls whether to enable DDL-embedded `ANALYZE`. When enabled, DDL statements that create new indexes ([`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) and DDL statements that reorganize existing indexes ([`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)) automatically run statistics collection before the index becomes visible. For more information, see [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- This variable controls whether to enable DDL-embedded `ANALYZE`. When enabled, DDL statements that create new indexes ([`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) and DDL statements that reorganize existing indexes ([`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)) automatically run statistics collection before the index becomes visible. For more information, see [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md).
- This variable controls whether to enable DDL-embedded `ANALYZE`. When enabled, DDL statements that create new indexes ([`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) or reorganize existing indexes ([`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)) automatically collect statistics before the index becomes visible. For more information, see [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md).


Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` will stop waiting synchronously for `ANALYZE` to finish and will continue the subsequent process so that the index becomes visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.

Example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example:
For example:


- Start collecting statistics before the index becomes visible.
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) will not synchronously wait for `ANALYZE` to finish and will continue so the index becomes visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example:

1 rows in set (0.001 sec)
```

From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that in the following `EXPLAIN` the index `idx` has its statistics automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`), so the optimizer can immediately use those statistics for a range scan. If index creation or reorganization and `ANALYZE` take a long time, check the DDL job status by executing `ADMIN SHOW DDL JOBS`. If the `COMMENTS` column contains `analyzing`, it indicates that the DDL job is collecting statistics.
Copy link
Collaborator

@qiancai qiancai Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that in the following `EXPLAIN` the index `idx` has its statistics automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`), so the optimizer can immediately use those statistics for a range scan. If index creation or reorganization and `ANALYZE` take a long time, check the DDL job status by executing `ADMIN SHOW DDL JOBS`. If the `COMMENTS` column contains `analyzing`, it indicates that the DDL job is collecting statistics.
From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `MODIFY COLUMN` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

1 rows in set (0.001 sec)
```

In the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that in the subsequent `EXPLAIN`, the index `idx` has its statistics automatically collected and loaded into memory (you can verify it by running `SHOW STATS_HISTOGRAMS`). Therefore, the optimizer can immediately use those statistics for a range scan. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. If the `COMMENTS` column contains `analyzing`, it means that the DDL job is collecting statistics.
Copy link
Collaborator

@qiancai qiancai Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that in the subsequent `EXPLAIN`, the index `idx` has its statistics automatically collected and loaded into memory (you can verify it by running `SHOW STATS_HISTOGRAMS`). Therefore, the optimizer can immediately use those statistics for a range scan. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. If the `COMMENTS` column contains `analyzing`, it means that the DDL job is collecting statistics.
From the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `ADD INDEX` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 7, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 7, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-11-07 10:26:28.379496409 +0000 UTC m=+438637.822526288: ☑️ agreed by qiancai.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-1-more-lgtm Indicates a PR needs 1 more LGTM. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. type/compatibility-or-feature-change This PR involves compatibility changes or feature behavior changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants