Skip to content

add saas scenario best practices #20668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,7 @@
- [Local Read Under Three Data Centers Deployment](/best-practices/three-dc-local-read.md)
- [Use UUIDs](/best-practices/uuid.md)
- [Read-Only Storage Nodes](/best-practices/readonly-nodes.md)
- [SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of the new document is not clear enough. It is recommended to clarify what the best practices are about.

Suggested change
- [SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md)
- [Best Practices for SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md)

- [Use Placement Rules](/configure-placement-rules.md)
- [Use Load Base Split](/configure-load-base-split.md)
- [Use Store Limit](/configure-store-limit.md)
Expand Down
82 changes: 82 additions & 0 deletions best-practices/saas-best-practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: Best Practices for SaaS Multi-Tenant Scenarios
summary: Learn best practices for TiDB in SaaS multi-tenant scenarios.
---

# Best Practices for SaaS Multi-Tenant Scenarios

This document introduces best practices for TiDB in SaaS (Software as a Service) multi-tenant environments, especially in scenarios where the **number of tables in a single cluster exceeds one million**. By making reasonable configurations and choices, you can enable TiDB to run efficiently and stably in SaaS scenarios while reducing resource consumption and costs.

> **Note:**
Copy link
Preview

Copilot AI Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency with the version recommendations; while the note on line 12 suggests using TiDB v8.5.0 or later, other sections mention features available from TiDB v8.4.0. Consider aligning these version recommendations for clarity.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

>
> It is recommended to use TiDB v8.5.0 or later.

## TiDB hardware requirements

It is recommended to use high-memory TiDB instances. For example, 32 GiB or more memory for 1 million tables, and 64 GiB or more memory for 3 million tables. High-memory TiDB instances can allocate more cache space for Infoschema, Statistics, and execution plan caches, thereby improving cache hit rates and consequently enhancing business performance. Additionally, larger memory can effectively mitigate fluctuation and stability issues caused by TiDB GC.

The following are recommended hardware configurations for TiKV and PD:

* TiKV: 8 vCPU 32 GiB or higher
* PD: 8 CPU 16 GiB or higher

## Control the number of Regions

If you need to create a large number of tables (for example, more than 100,000), it is recommended to set the TiDB configuration item [`split-table`](/tidb-configuration-file.md#split-table) to `false` to reduce the number of cluster Regions, thus alleviating memory pressure on TiKV.

Check warning on line 25 in best-practices/saas-best-practices.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'a large number of' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'a large number of' because it may cause confusion.", "location": {"path": "best-practices/saas-best-practices.md", "range": {"start": {"line": 25, "column": 23}}}, "severity": "INFO"}

## Configure caches

* Starting from TiDB v8.4.0, TiDB loads table information involved in SQL statements into the Infoschema cache on demand during SQL execution. You can monitor the **Schema Load** panel in TiDB monitoring, specifically the **Infoschema v2 Cache Size** and **Infoschema v2 Cache Operation** sub-panels, to view the size and hit rate of the Infoschema cache. Use the system variable [`tidb_schema_cache_size`](/system-variables.md#tidb_schema_cache_size-new-in-v800) to adjust the memory limit of the Infoschema cache to meet business needs. The size of the Infoschema cache is linearly related to the number of different tables involved in SQL execution. In actual tests, fully caching metadata for 1 million tables (4 columns, 1 primary key, and 1 index) requires about 2.4 GiB memory.
* TiDB loads table statistics involved in SQL statements into the Statistics cache on demand during SQL execution. You can monitor the **Statistics & Plan Management** panel in TiDB monitoring, specifically the **Stats Cache Cost** and **Stats Cache OPS** sub-panels, to view the size and hit rate of the Statistics cache. Use the system variable [`tidb_stats_cache_mem_quota`](/system-variables.md#tidb_stats_cache_mem_quota-new-in-v610) to adjust the memory limit of the Statistics cache to meet business needs. In actual tests, executing simple SQL (using the IndexRangeScan operator) on 100,000 tables results in a Stats cache cost of about 3.96 GiB memory.

## Collect statistics

* Starting from TiDB v8.4.0, TiDB introduces the system variable [`tidb_auto_analyze_concurrency`](/system-variables.md#tidb_auto_analyze_concurrency-new-in-v840) to control the concurrency of individual automatic statistics collection tasks. In multi-table scenarios, you can increase this concurrency as needed to improve the throughput of automatic analysis. As the concurrency value increases, the throughput of automatic analysis and the CPU usage of the TiDB Owner node increase linearly. In actual tests, using a concurrency value of 16 allows automatic analysis of 320 tables (each with 10,000 rows, 4 columns, and 1 index) within one minute, consuming one CPU core of the TiDB Owner node.
* [`tidb_auto_build_stats_concurrency`](/system-variables.md#tidb_auto_build_stats_concurrency-new-in-v650) and [`tidb_build_sampling_stats_concurrency`](/system-variables.md#tidb_build_sampling_stats_concurrency-new-in-v750) affect the concurrency of TiDB statistics construction and should be adjusted based on the scenario:
- For scenarios with many partitioned tables, prioritize increasing the value of `tidb_auto_build_stats_concurrency`.

Check warning on line 36 in best-practices/saas-best-practices.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "best-practices/saas-best-practices.md", "range": {"start": {"line": 36, "column": 26}}}, "severity": "INFO"}
- For scenarios with many columns, prioritize increasing the value of `tidb_build_sampling_stats_concurrency`.

Check warning on line 37 in best-practices/saas-best-practices.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "best-practices/saas-best-practices.md", "range": {"start": {"line": 37, "column": 26}}}, "severity": "INFO"}
* The product of the three values of `tidb_auto_analyze_concurrency`, `tidb_auto_build_stats_concurrency` and `tidb_build_sampling_stats_concurrency` should not exceed the number of TiDB CPU cores to avoid excessive resource usage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The last sentence mentions actual tests. Can you provide more details on the test environment and methodology? This would add credibility to the recommendations.


## Query system tables efficiently

When querying system tables, add conditions such as `TABLE_SCHEMA`, `TABLE_NAME`, or `TIDB_TABLE_ID` to avoid scanning a large number of databases and tables in the cluster, thereby improving query speed and reducing resource consumption.

Check warning on line 42 in best-practices/saas-best-practices.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'a large number of' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'a large number of' because it may cause confusion.", "location": {"path": "best-practices/saas-best-practices.md", "range": {"start": {"line": 42, "column": 120}}}, "severity": "INFO"}

For example, in a scenario with 3 million tables:

- Executing the following SQL statement consumes about 8 GiB of memory.

```sql
SELECT COUNT(*) FROM information_schema.tables;
```

- Executing the following SQL statement takes about 20 minutes.

```sql
SELECT COUNT(*) FROM information_schema.views;
```

By adding the suggested query conditions to the above SQL statements, memory consumption becomes negligible, and query time is reduced to milliseconds.

## Handle connection-intensive scenarios

In SaaS multi-tenant scenarios, each user usually connects to TiDB to operate data in their own tenant (database). To save costs, users want TiDB nodes to support as many connections as possible.

Check warning on line 62 in best-practices/saas-best-practices.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "best-practices/saas-best-practices.md", "range": {"start": {"line": 62, "column": 167}}}, "severity": "INFO"}

* Increase the TiDB configuration item [`token-limit`](/tidb-configuration-file.md#token-limit) (`1000` by default) to support more concurrent requests.
* The memory usage of TiDB is roughly linear with the number of connections. In actual tests, 200,000 idle connections increase TiDB process memory by about 30 GiB. It is recommended to increase TiDB memory specifications.
* If you use PREPARED statements, each connection maintains a session-level Prepared Plan Cache. In practice, you might not execute the DEALLOCATE statement for a long time, which can result in a high number of plans in the cache even if the QPS of Execute statements is low. The memory usage of the plan cache is linearly related to the number of plans in the cache. In actual tests, 400,000 execution plans involving IndexRangeScan consume about 5 GiB memory. It is recommended to increase TiDB memory specifications.

## Use stale read carefully

When you use [Stale Read](/stale-read.md), if the schema version used is too outdated, it might trigger a full load of historical schemas, significantly impacting performance. You can mitigate this issue by increasing the value of [`tidb_schema_version_cache_limit`](/system-variables.md#tidb_schema_version_cache_limit-new-in-v740), for example, change it to `255`.

## Optimize BR backup and restore

* In scenarios involving full recovery of millions of tables, it is recommended to use high-memory BR instances. For example:
- For 1 million tables, use BR instances with 32 GiB or more memory.
- For 3 million tables, use BR instances with 64 GiB or more memory.
* BR log backup and snapshot recovery consume additional TiKV memory. It is recommended to use TiKV with 32 GiB or higher memory specifications.
* Adjust the BR configurations [`pitr-batch-count` and `pitr-concurrency`](/br/use-br-command-line-tool.md#common-options) as needed to improve BR log recovery speed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a direct link to the documentation for pitr-batch-count and pitr-concurrency here.


## Import data with TiDB Lightning

When using [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) to import data for millions of tables, it is recommended to use TiDB Lightning [physical import mode](/tidb-lightning/tidb-lightning-physical-import-mode.md) for large tables (for example, tables larger than 100 GiB, which are usually fewer in number) and TiDB Lightning [logical import mode](/tidb-lightning/tidb-lightning-logical-import-mode.md) for small tables (which are usually more numerous).