Run this command line script to get TiDBCloud Cluster Metrics
- Capacity plan: a capacity plan will be generated based on the metrics and estimated traffic increase and resource redundancy required. It will be saved in a csv file.
- Health check: a health report will be generated.
- List clusters: List all clusters for a tenant, including cluster id, project id, number of nodes for each TiDB components
$ python3 main.py --help
Usage: main.py [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
capacity
health-check
list-clusters
$ python3 main.py capacity --help
Welcome to TiDBCloud Capacity Planner and Health Checker!
Usage: main.py capacity [OPTIONS]
Options:
-m, --mode [all|node|cluster] capacity planner mode
--help Show this message and exit.
$ python3 main.py health-check --help
Welcome to TiDBCloud Capacity Planner and Health Checker!
Usage: main.py health-check [OPTIONS]
Options:
-t, --type [all|tidb|tikv|pd|tiflash]
health check type
-r, --report [console] report channel
--help Show this message and exit.
$ python3 main.py list-clusters --help
Welcome to TiDBCloud Capacity Planner and Health Checker!
Usage: main.py list-clusters [OPTIONS]
Options:
-w, --write [Yes|No] Write to spreadsheet
--help Show this message and exit.
- Python3.8+
- prometheus_api_client: A Python wrapper for the Prometheus http api and some tools for metrics processing.
pip install prometheus-api-client
- pyyaml
pip install pyyaml
- openpyxl
pip install openpyxl
- click
pip install click
- Configuration in yaml file
tidbcloud.yaml
logging.level
: case-insensitive, must be one of "debug", "info", "warning", "error", "critical"logging.to_file
: if ture, a log file will be generated underlogs
, otherwise only console logprometheus.start_time
: Prometheus query start time, format: 'dd-mm-YYYY HH:MM:SS'prometheus.end_time
: Prometheus query ent time, format:'dd-mm-YYYY HH:MM:SS'prometheus.step_in_seconds
: Prometheus query step(interval), unit is second, normally 60 for two days, 30 for 1 dayprometheus.cluster_prom_id_token
: Token for prometheus APIprometheus.cluster_prom_base_url
: URL for prometheus API, contains TiDBCloud Cluster informationcapacity.plan_traffic_x
: Estimated times of traffic increasecapacity.plan_resource_redundancy_x
: Times of resource redundancy requiredlark.user_access_token
: Compulsory forlist-cluster
if you want to write results in to Lark speadsheet
- How to get
prometheus.cluster_prom_id_token
andprometheus.cluster_prom_base_url
- step 1, login tidbcloud.com in Chrome
- step 2, right-click the page to
Inspect
- step 3, goto
Application
tab, - step 4, under
Local storage
-> https://tidbcloud.com
component | name | max | average | percentile_50.0 | percentile_75.0 | percentile_80.0 | percentile_85.0 | percentile_90.0 | percentile_95.0 | percentile_99.0 | percentile_99.9 | capacity | instance_cnt | plan_max | plan_average | plan_percentile_50.0 | plan_percentile_75.0 | plan_percentile_80.0 | plan_percentile_85.0 | plan_percentile_90.0 | plan_percentile_95.0 | plan_percentile_99.0 | plan_percentile_99.9 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tidb | CPU(core) | 23.784666666636863 | 6.891452040305764 | 6.4966666666169965 | 10.47533333338797 | 11.068000000038495 | 11.763333333283663 | 12.577333333343267 | 13.587733333359147 | 15.136666666684672 | 16.846442666622035 | 32 | 64 | 190.2773333330949 | 55.13161632244611 | 51.97333333293597 | 83.80266666710376 | 88.54400000030796 | 94.1066666662693 | 100.61866666674614 | 108.70186666687317 | 121.09333333347737 | 134.77154133297628 |
tidb | Memory(byte) | 26741944320.0 | 16920691356.352173 | 17130733568.0 | 17670386688.0 | 17807736012.8 | 17977865420.8 | 18201889587.2 | 18560214425.6 | 19336546385.92 | 20367362531.32808 | 66206752768 | 64 | 103.40240926645895 | 65.4268153341575 | 66.23898031633485 | 68.32564357867768 | 68.85672878794647 | 69.5145639275223 | 70.38079258548662 | 71.766318302959 | 74.76814173535634 | 78.75397282042981 |
tidb | Disk IOPS | 467.56666666666666 | 11.098697260468551 | 10.483333333333333 | 11.5 | 11.891666666666667 | 12.366666666666667 | 12.841666666666667 | 13.416666666666666 | 17.025749999999974 | 110.29567500000084 | 40000 | 64 | 2.992426666666667 | 0.07103166246699873 | 0.06709333333333332 | 0.0736 | 0.07610666666666667 | 0.07914666666666667 | 0.08218666666666667 | 0.08586666666666666 | 0.10896479999999983 | 0.7058923200000053 |
tidb | Disk Bandwidth(byte) | 79486685.86666666 | 239634.35369328497 | 109324.8 | 141436.8 | 152755.2 | 171619.62666666662 | 216119.0400000008 | 407847.4666666666 | 1418572.4159999895 | 20195406.711474445 | 1245184000 | 64 | 16.341835087719296 | 0.04926693126917865 | 0.022476315789473685 | 0.029078289473684207 | 0.03140526315789474 | 0.035283640350877186 | 0.0444323684210528 | 0.0838502192982456 | 0.29164728947368207 | 4.152016182457739 |
tidb | NetworkIn Bandwidth(byte) | 721571589.9833333 | 27812681.373172816 | 17402588.81666667 | 39020371.479166664 | 45927429.45666666 | 54322016.234166645 | 65899973.45000005 | 85512609.18749997 | 132907427.57933328 | 239396503.55442265 | 1610612736 | 64 | 114.69071546925439 | 4.420706649330905 | 2.766066998905606 | 6.2021210160520335 | 7.299968315230475 | 8.634251949654681 | 10.474518688519803 | 13.591863185167307 | 21.125066690338972 | 38.051049479551736 |
tidb | NetworkOut Bandwidth(byte) | 109156186.39999999 | 14940615.595263155 | 14426000.933333334 | 21817573.53333333 | 23120781.346666668 | 24554338.02416666 | 26289229.68166667 | 28942049.474999998 | 40539531.13066665 | 68605984.56516692 | 1610612736 | 64 | 17.349908574422198 | 2.3747468940835246 | 2.2929510964287654 | 3.467809920840793 | 3.6749492242601183 | 3.9028069216675214 | 4.1785605242517265 | 4.600214874744415 | 6.443584939744735 | 10.904627571927216 |
tikv | CPU(core) | 22.808333333333334 | 8.052314804191548 | 8.275 | 11.416666666666666 | 11.866666666666667 | 12.341666666666667 | 12.866666666666667 | 13.608333333333333 | 14.95 | 16.51653333333355 | 32 | 60 | 171.0625 | 60.39236103143661 | 62.0625 | 85.625 | 89.0 | 92.5625 | 96.5 | 102.0625 | 112.125 | 123.87400000000162 |
tikv | Memory(byte) | 64757587968.0 | 57881447840.94151 | 63778476032.0 | 64016157696.0 | 64063688704.0 | 64117889433.6 | 64183327129.6 | 64278327705.6 | 64428602900.48 | 64577265741.824005 | 132135682048 | 60 | 117.62016793218831 | 105.13093258775989 | 115.84179239427237 | 116.27349712743657 | 116.35982840255616 | 116.45827399198653 | 116.57712945022902 | 116.74968040608452 | 117.02262747240458 | 117.29264599707227 |
tikv | Storage(byte) | 535961378515.0 | 361504338289.31384 | 442307236902.0 | 470968634179.0 | 482489093109.0 | 492801961876.0 | 502209251350.0 | 512172191227.0 | 523254653903.0 | 529922350440.00024 | 2113559478272 | 60 | 60.859763903481756 | 41.04972776085263 | 50.22510033323925 | 53.47967415394095 | 54.78785127013951 | 55.95890348301766 | 57.027124887227146 | 58.15844179364083 | 59.4168833324683 | 60.174017061294514 |
tikv | Disk IOPS | 2731.8 | 1600.941560132902 | 1686.7083333333335 | 1799.4708333333333 | 1827.6350000000002 | 1862.6433333333332 | 1911.71 | 1989.7924999999998 | 2111.456999999999 | 2340.291233333337 | 40000 | 60 | 16.3908 | 9.605649360797411 | 10.120250000000002 | 10.796825 | 10.965810000000001 | 11.175859999999998 | 11.470260000000001 | 11.938754999999999 | 12.668741999999995 | 14.041747400000022 |
tikv | Disk Bandwidth(byte) | 177578423.46666664 | 52251039.23717693 | 56854656.0 | 70412072.53333333 | 73642921.81333335 | 77325027.84 | 81805142.18666667 | 87842415.36 | 100117152.5973333 | 121224877.07306707 | 1245184000 | 60 | 34.22692680921052 | 10.07100108652413 | 10.958314144736843 | 13.571405838815789 | 14.194128125000004 | 14.903826809210528 | 15.767335690789473 | 16.93097541118421 | 19.29684016447368 | 23.36519783223692 |
tikv | NetworkIn Bandwidth(byte) | 89017284.35 | 14631376.639736388 | 14266203.9 | 20065022.108333334 | 21117279.176666666 | 22303078.026666667 | 23856447.096666664 | 26557750.6 | 37704011.943333186 | 58618524.251466826 | 1610612736 | 60 | 13.264609031379223 | 2.1802450179660897 | 2.125830039381981 | 2.9899212879439196 | 3.146719809869925 | 3.323417607943217 | 3.5548876369992883 | 3.957413226366043 | 5.618335596223651 | 8.734840788165751 |
tikv | NetworkOut Bandwidth(byte) | 610273573.65 | 36800260.63627839 | 25660910.583333332 | 48946311.40833333 | 57989962.74 | 69362410.76333334 | 84993119.1433333 | 108843962.25833325 | 157566252.0966662 | 267867539.45127594 | 1610612736 | 60 | 90.9378489330411 | 5.48366615716791 | 3.823773649831613 | 7.293568761398395 | 8.641177824139595 | 10.335804635783036 | 12.664961687227086 | 16.219014265884944 | 23.47920121202859 | 39.915373839628096 |
tiflash | CPU(core) | 4.416666666666667 | 2.146923225308642 | 2.1083333333333334 | 2.4583333333333335 | 2.558333333333333 | 2.6666666666666665 | 2.825 | 3.0416666666666665 | 3.4583333333333335 | 4.08101666666668 | 32 | 3 | 1.65625 | 0.8050962094907408 | 0.790625 | 0.921875 | 0.9593749999999999 | 1.0 | 1.0593750000000002 | 1.140625 | 1.296875 | 1.530381250000005 |
tiflash | Memory(byte) | 18783178752.0 | 13776136496.82963 | 13647306752.0 | 14205965312.0 | 14349398016.000002 | 14542488780.8 | 14804665139.2 | 15265175961.599998 | 16246958653.44001 | 17866946875.39206 | 267057463296 | 3 | 0.8440061634756644 | 0.6190189778696653 | 0.6132301228461976 | 0.6383329701407874 | 0.6447779967158069 | 0.6534543660222575 | 0.6652350377247853 | 0.6859276998979257 | 0.7300432702200399 | 0.8028360632897394 |
tiflash | Storage(byte) | 577336631778.0 | 565327623219.5511 | 564687437622.5 | 569220174264.5 | 570198602871.6 | 571012189417.9 | 571916048319.0 | 572727655758.1 | 574149121927.7 | 575365797786.477 | 1585119989760 | 3 | 4.370672016056628 | 4.279758959863824 | 4.274912495738558 | 4.309227147030184 | 4.31663424766676 | 4.322793427172836 | 4.329636004948188 | 4.33578020181159 | 4.346541276143751 | 4.3557520049211575 |
tiflash | Disk IOPS | 2432.3999999999996 | 1980.8632012226217 | 2009.8708333333334 | 2105.039583333333 | 2124.6116666666667 | 2147.8479166666666 | 2179.5 | 2226.8875000000003 | 2285.4057500000004 | 2376.089483333339 | 40000 | 3 | 0.7297199999999999 | 0.5942589603667865 | 0.60296125 | 0.631511875 | 0.6373835 | 0.644354375 | 0.65385 | 0.6680662500000001 | 0.6856217250000002 | 0.7128268450000018 |
tiflash | Disk Bandwidth(byte) | 127711074.13333334 | 46139117.60294602 | 44310118.400000006 | 61630208.00000001 | 66574980.266666666 | 70807158.18666667 | 74966400.42666668 | 81943101.01333332 | 88969590.95466669 | 113195749.41013435 | 1245184000 | 3 | 1.2307682154605264 | 0.44464867138941094 | 0.4270223684210527 | 0.5939383223684211 | 0.6415917351973685 | 0.6823777837171052 | 0.7224609416118422 | 0.7896963116776314 | 0.8574115082236845 | 1.0908821450657993 |
tiflash | NetworkIn Bandwidth(byte) | 25218719.400000002 | 5479655.264724445 | 4197687.066666666 | 6883541.754166666 | 7875524.610000003 | 8889669.059999999 | 10329853.090000002 | 13345982.555 | 18301372.35050001 | 21961515.843933348 | 1610612736 | 3 | 0.18789410144090654 | 0.04082661319318745 | 0.031275205810864765 | 0.051286382631709176 | 0.05867723084986212 | 0.06623319581151008 | 0.07696340300142766 | 0.0994353186711669 | 0.1363558497317136 | 0.16362604382584503 |
tiflash | NetworkOut Bandwidth(byte) | 30871977.95 | 4998730.973810626 | 3221239.5166666666 | 6484757.5875 | 7643299.823333334 | 9051939.528333332 | 10739699.711666668 | 13705309.657499999 | 19115213.09033334 | 26575713.058033533 | 1610612736 | 3 | 0.23001415990293025 | 0.037243448002715604 | 0.02400010464092096 | 0.04831520905718208 | 0.05694702136019866 | 0.06744220501432815 | 0.0800169982885321 | 0.10211251420900225 | 0.14241943575690194 | 0.19800449205959988 |
pd | CPU(core) | 17.123333333308498 | 2.6657532407407603 | 0.15333333333255722 | 6.595666666639348 | 6.871333333353202 | 7.055533333343143 | 7.275400000053148 | 10.008633333342152 | 15.028153333337379 | 16.661665333331644 | 32 | 3 | 6.421249999990687 | 0.9996574652777851 | 0.05749999999970896 | 2.4733749999897556 | 2.5767500000074506 | 2.6458250000036787 | 2.7282750000199307 | 3.7532375000033067 | 5.635557500001517 | 6.248124499999367 |
pd | Memory(byte) | 2833547264.0 | 1643391174.162963 | 1124302848.0 | 2654711808.0 | 2679431168.0 | 2727464960.0 | 2745749504.0 | 2780340224.0 | 2796290744.32 | 2815262720.0 | 66206752768 | 3 | 0.5135815569621867 | 0.2978652972010318 | 0.2037803337565435 | 0.48116756016762935 | 0.48564795389784976 | 0.4943540976052722 | 0.497668178402572 | 0.5039377600184314 | 0.5068287981049951 | 0.5102674761648869 |
pd | Disk IOPS | 204.24166666666667 | 16.563721099047033 | 10.55 | 21.041666666666664 | 21.708333333333332 | 22.366666666666667 | 23.575833333333335 | 34.476249999999986 | 68.61750000000004 | 162.06711666666675 | 40000 | 3 | 0.0612725 | 0.00496911632971411 | 0.0031650000000000003 | 0.0063124999999999995 | 0.0065125 | 0.00671 | 0.007072750000000001 | 0.010342874999999994 | 0.02058525000000001 | 0.04862013500000003 |
pd | Disk Bandwidth(byte) | 2114188.8 | 205138.40760609633 | 72142.93333333333 | 318773.3333333334 | 339725.6533333333 | 362276.26666666666 | 396188.5866666667 | 981537.7066666667 | 1133308.2026666673 | 1841979.5200000098 | 1245184000 | 3 | 0.02037471217105263 | 0.0019769454885969913 | 0.000695250822368421 | 0.003072060032894737 | 0.0032739802631578945 | 0.0034913034539473687 | 0.0038181208881578957 | 0.009459206414473684 | 0.010921838404605269 | 0.01775139597039483 |
pd | NetworkIn Bandwidth(byte) | 39037441.016666666 | 5637797.400659997 | 600457.975 | 14511753.454166666 | 15485324.030000001 | 15874992.755833333 | 16292030.848333333 | 17209804.14583333 | 24901563.423500016 | 29617194.922616865 | 1610612736 | 3 | 0.29085160059233506 | 0.04200486392274497 | 0.004473760537803173 | 0.1081209887129565 | 0.11537465475499631 | 0.11827791300291816 | 0.1213850889230768 | 0.12822303284580505 | 0.18553110527619732 | 0.22066529782576014 |
pd | NetworkOut Bandwidth(byte) | 36515445.11666667 | 6083618.387593778 | 267739.7166666667 | 16286233.104166666 | 17456376.450000003 | 17864665.948333334 | 18336149.77 | 19826134.20083333 | 27210716.633666668 | 30449632.54978338 | 1610612736 | 3 | 0.2720612668742736 | 0.04532648911769523 | 0.00199481633802255 | 0.12134189236288269 | 0.13006013967096808 | 0.13310213348517816 | 0.1366149616986513 | 0.14771621078873673 | 0.20273563737918934 | 0.22686744145887627 |
For node level metrics require global mode office network
For example, plan_max
is the calculated number of instance needed based on max
value
plan_max = max * instance_cnt * plan_traffic_x * plan_resource_redundancy_x / capacity
pd:
diagnostics:
'🚨 Critical: pd_tso_wait_duration_999': 0.00280602557503623
metrics:
pd_tso_wait_duration_999: 0.00280602557503623
store_disconnected_count: 0.0
store_down_count: 0.0
store_low_space_count: 0.0
store_offline_count: 0.0
store_slow_count: 0.0
store_tombstone_count: 0.0
store_unhealth_count: 0.0
tidb:
diagnostics:
'⚠️ Warning: threshold missing for failed_query_opm': 0.0
'🚨 Critical: qps': 169871.1166666667
metrics:
failed_query_opm: 0.0
qps: 169871.1166666667
query_duration_p999: 0.015002920253115134
tiflash:
diagnostics: '⚠️ Warning: tiflash threshold missing'
metrics: {}
tikv:
diagnostics:
'⚠️ Warning: metric empty for tikv_channel_full_total': null
'⚠️ Warning: metric empty for tikv_coprocessor_request_error': null
'⚠️ Warning: metric empty for tikv_scheduler_too_busy_total': null
'🚨 Critical: kv_request_duration_99_by_store': 0.13708654709296972
'🚨 Critical: tikv_grpc_msg_duration_999': 0.025472
metrics:
down_peer_region_count: 0.0
empty_region_count: 0.0
extra_peer_region_count: 0.0
kv_request_duration_99_by_store: 0.13708654709296972
miss_peer_region_count: 0.0
offline_peer_region_count: 0.0
pending_peer_region_count: 0.0
region_count: 24811.0
tikv_channel_full_total: null
tikv_coprocessor_request_error: null
tikv_engine_write_stall: 0.0
tikv_grpc_msg_duration_999: 0.025472
tikv_raftstore_store_write_msg_block_wait_duration_seconds_count: 0.0
tikv_scheduler_too_busy_total: null
Cluster ID: 1111111111111111111 Project ID: 1333333333333333333 Tenant ID: 1388888888888888888 TiDB: 24 TiKV: 51 TiFlash: 33
Cluster ID: 1222222222222222222 Project ID: 1444444444444444444 Tenant ID: 1388888888888888888 TiDB: 38 TiKV: 36 TiFlash: 25
Cluster ID: 1333333333333333333 Project ID: 1455444444444444444 Tenant ID: 1388888888888888888 TiDB: 32 TiKV: 46 TiFlash: 15
For On-premise deployment TiDB cluster, please download v0.1.0