Skip to content

Conversation

@hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Sep 8, 2025

What problem does this PR solve?

Issue Number: close #63407

Problem Summary:

What changed and how does it work?

Currently, some customers have created a large number of bindings. However, it is difficult to determine whether these bindings are still in use. The sheer volume of bindings also puts pressure on TiDB. Therefore, we need a way to identify and mark the bindings that are not in use.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

start a tidb with master which is started by script. Tikv and PD is started by tiup playground nightly --mode tikv-slim.

+----------------------------------+----------------------------------+------------+---------+-------------------------+-------------------------+---------+-----------+---------+------------+-------------+
| original_sql                     | bind_sql                         | default_db | status  | create_time             | update_time             | charset | collation | source  | sql_digest | plan_digest |
+----------------------------------+----------------------------------+------------+---------+-------------------------+-------------------------+---------+-----------+---------+------------+-------------+
| builtin_pseudo_sql_for_bind_lock | builtin_pseudo_sql_for_bind_lock | mysql      | builtin | 0000-00-00 00:00:00.000 | 0000-00-00 00:00:00.000 |         |           | builtin | <null>     | <null>      |
+----------------------------------+----------------------------------+------------+---------+-------------------------+-------------------------+---------+-----------+---------+------------+-------------+

kill this TiDB and start changed tidb verson.

+----------------------------------+----------------------------------+------------+---------+-------------------------+-------------------------+---------+-----------+---------+------------+-------------+----------------+
| original_sql                     | bind_sql                         | default_db | status  | create_time             | update_time             | charset | collation | source  | sql_digest | plan_digest | last_used_time |
+----------------------------------+----------------------------------+------------+---------+-------------------------+-------------------------+---------+-----------+---------+------------+-------------+----------------+
| builtin_pseudo_sql_for_bind_lock | builtin_pseudo_sql_for_bind_lock | mysql      | builtin | 0000-00-00 00:00:00.000 | 0000-00-00 00:00:00.000 |         |           | builtin | <null>     | <null>      | <null>         |
+----------------------------------+----------------------------------+------------+---------+-------------------------+-------------------------+---------+-----------+---------+------------+-------------+----------------+
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/planner SIG: Planner and removed do-not-merge/needs-tests-checked labels Sep 8, 2025
@codecov
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 81.10236% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.5088%. Comparing base (4f57389) to head (a45b570).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #63409        +/-   ##
================================================
+ Coverage   72.7332%   75.5088%   +2.7755%     
================================================
  Files          1848       1895        +47     
  Lines        498692     514696     +16004     
================================================
+ Hits         362715     388641     +25926     
+ Misses       113968     102798     -11170     
- Partials      22009      23257      +1248     
Flag Coverage Δ
integration 49.0451% <70.0787%> (?)
unit 73.0139% <70.8661%> (+0.7639%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.8700% <ø> (ø)
parser ∅ <ø> (∅)
br 63.3775% <ø> (+16.9514%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hawkingrei
Copy link
Member Author

/retest

2 similar comments
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 12, 2025
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 12, 2025
@hawkingrei
Copy link
Member Author

/retest

@hawkingrei
Copy link
Member Author

/retest

@AilinKid
Copy link
Contributor

/retest-required

source VARCHAR(10) NOT NULL DEFAULT 'unknown',
sql_digest varchar(64) DEFAULT NULL,
plan_digest varchar(64) DEFAULT NULL,
last_used_time TIMESTAMP DEFAULT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also specify the precision, @henrybw just updated the precision to 6 to fix some unstable problems: https://github.com/pingcap/tidb/pull/63524/files

)

func updateBindingUsageInfoToStorage(sPool util.DestroyableSessionPool, bindings []*Binding) error {
err := callWithSCtx(sPool, true, func(sctx sessionctx.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In one of our customer's workload, there are around 100,000 bindings, is this OK to execute 100,000 update statements in one transaction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to limit the number of update statements in this transaction to around 200 or 500 for safety I think. Better to confirm this with our Txn Team members.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will confirm this with the transaction team.

Additionally, if a cluster really has that many binds, there must also be many TiDB instances. In fact, we need to consider the impact on the entire cluster from multiple instances. So as long as it is sufficiently scattered, with small batches and executed by multiple instances, it should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slow write here actually won't have much impact. DBAs won't delete a bindinfo just because it hasn't been used for a day or two; that would be quite risky.

@hawkingrei
Copy link
Member Author

/retest

Copy link
Contributor

@henrybw henrybw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to load usage info from storage into the bindings cache? I think we need to also update readBindingsFromStorage() to read the last_used_time column into the Binding structures.

Comment on lines 1595 to 1621
randomDuration(
bindinfo.MinCheckIntervalForUpdateBindingUsageInfo,
bindinfo.MaxCheckIntervalForUpdateBindingUsageInfo,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking random durations seems unnecessarily complicated. What's wrong with picking just one check interval?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the customer's cluster has many TiDB nodes, especially for SaaS clusters, we need to avoid having all TiDB nodes execute batch updates at the same time, which could impact the customer's business. Additionally, updating bindinfo is not urgent, so it's fine to have larger time intervals between updates. Moreover, the code for this is not extensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code, there is also a random interval inside the deltaUpdateTickerWorker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, if the tidb server shuts down in 6 hours, the usage info will never get a chance to be recorded.
I think we need to confirm with PM that this is the expected behavior.

}
binding.ResetUsageInfo()
}
if cnt > updateBindingUsageInfoBatchSize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cnt is never updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated.

lastUsedTime := ts.UTC().Format(types.TimeFormat)
_, _, err := execRows(
sctx,
"UPDATE mysql.bind_info SET last_used_time = CONVERT_TZ(%?, '+00:00', @@TIME_ZONE) WHERE sql_digest = %?",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a table full scan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update. I use plan_digest and sql_digest as the condition. then it can use the digest_index to update.

+---------------+---------+------+--------------------------------------------------------------+---------------+
| id            | estRows | task | access object                                                | operator info |
+---------------+---------+------+--------------------------------------------------------------+---------------+
| Update_3      | N/A     | root |                                                              | N/A           |
| └─Point_Get_1 | 1.00    | root | table:bind_info, index:digest_index(plan_digest, sql_digest) |               |
+---------------+---------+------+--------------------------------------------------------------+---------------+

@hawkingrei
Copy link
Member Author

This doesn't seem to load usage info from storage into the bindings cache? I think we need to also update readBindingsFromStorage() to read the last_used_time column into the Binding structures.

Yes, it is only for the user to check whether the bind is in use.

@hawkingrei hawkingrei force-pushed the 63407 branch 2 times, most recently from 24a0f3e to d043c76 Compare September 17, 2025 06:20
@ti-chi-bot ti-chi-bot bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 17, 2025
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
@hawkingrei hawkingrei requested a review from Copilot September 30, 2025 00:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a last_used_date column to the mysql.bind_info table to track binding usage frequency, helping customers identify unused bindings that may be putting pressure on TiDB.

Key changes:

  • Adds new database schema column and upgrade mechanism for version 253
  • Implements usage tracking that updates binding last-used timestamps when bindings are matched
  • Adds periodic storage updates with randomized intervals to prevent thundering herd problems

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/session/upgrade.go Adds version 253 upgrade function to add last_used_date column
pkg/session/bootstrap.go Updates bind_info table schema to include new column
pkg/bindinfo/binding.go Adds usage tracking fields and methods to Binding struct
pkg/bindinfo/utils.go Implements batch update logic for writing usage info to storage
pkg/bindinfo/binding_cache.go Adds interface method for updating usage info to storage
pkg/domain/domain.go Adds periodic worker to update binding usage info with randomized intervals
pkg/bindinfo/tests/bind_usage_info_test.go Comprehensive test coverage for new usage tracking functionality
Multiple test files Updates existing tests to handle new column in insert statements

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Copy link
Contributor

@henrybw henrybw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Signed-off-by: Weizhen Wang <[email protected]>
Copy link
Contributor

@henrybw henrybw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for diligently working through this. LGTM

@hawkingrei
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, henrybw, Leavrth, qw4990, you06, yudongusa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Sep 30, 2025
@hawkingrei
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit 70c7d50 into pingcap:master Sep 30, 2025
37 of 39 checks passed
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 30, 2025

@hawkingrei: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/mysql-test a45b570 link unknown /test mysql-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hawkingrei
Copy link
Member Author

/cherrypick release-8.5

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Sep 30, 2025
@ti-chi-bot
Copy link
Member

@hawkingrei: new pull request created to branch release-8.5: #63824.
But this PR has conflicts, please resolve them!

In response to this:

/cherrypick release-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

hawkingrei added a commit to ti-chi-bot/tidb that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add last_used_time column to mysql.bind_info system table to track usage frequency

10 participants