Skip to content

Conversation

qiancai
Copy link
Owner

@qiancai qiancai commented Nov 1, 2024

What is changed, added or deleted? (Required)

The doc changes come from a draft PR for testing purposes.

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v8.4 (TiDB 8.4 versions)
  • v8.3 (TiDB 8.3 versions)
  • v8.2 (TiDB 8.2 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Summary by CodeRabbit

  • New Features
    • Expanded glossary with new entries related to TiDB, enhancing definitions for various terms including Backup and Restore, Disaster Recovery, and Performance Metrics.
    • Each term includes a brief definition and references to related documentation for better understanding.

Copy link

coderabbitai bot commented Nov 1, 2024

Walkthrough

The changes introduce a series of new entries to the glossary.md file, expanding the definitions related to TiDB. A total of 36 new terms have been added, including definitions for concepts such as "Backup and Restore tool," "Column Family," and "Disaster Recovery." Additionally, the entry for "Restore" has been updated to change its heading from lowercase to uppercase. This update enhances the glossary's comprehensiveness in areas pertinent to database operations, performance metrics, and cloud services.

Changes

File Change Summary
glossary.md Added new entries for terms: BR, CF, CTE, DDL, DM, DML, DMR, DR, DXF, EC2, GA, GC, GTID, HTAP, IMDS, KMS, KV, LDAP, LTS, MPP, OLAP, OLTP, OOM, PD, PiTR, QPS, RPC, RU, SST, TPS, TSO, URI, UUID. Updated entry for "Restore" (renamed from lowercase to uppercase).

Poem

In the land of data, where terms abound,
New words have sprouted from the ground.
BR and CF, they leap with glee,
Glossary growing, as vast as the sea!
With definitions clear, like a bright, shining moon,
Our knowledge expands, oh, what a boon! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
glossary.md (2)

218-221: Improve clarity and fix punctuation

The entry contains redundant phrasing and needs punctuation fixes.

 ### PiTR
 
-Point in Time Recovery (PiTR) is a database feature that allows the user to restore to a specific point in time (for example just before an accidental `DELETE` statement).
+Point-in-Time Recovery (PiTR) is a database feature that allows restoring a database to a specific moment (for example, just before an accidental `DELETE` statement).
🧰 Tools
🪛 LanguageTool

[style] ~219-~219: This phrase is redundant. Consider writing “Point” or “time”.
Context: ...r isolated from the network. ### PiTR Point in Time Recovery (PiTR) is a database feature t...

(MOMENT_IN_TIME)


[style] ~220-~220: This phrase is redundant. Consider writing “point” or “time”.
Context: ...llows the user to restore to a specific point in time (for example just before an accidental ...

(MOMENT_IN_TIME)


[uncategorized] ~220-~220: Possible missing comma found.
Context: ...estore to a specific point in time (for example just before an accidental DELETE stat...

(AI_HYDRA_LEO_MISSING_COMMA)


171-171: Standardize terminology formatting

For consistency, standardize the capitalization of terms across the glossary.

-OnLine Analytical Processing
+Online Analytical Processing

-OnLine Transaction Processing
+Online Transaction Processing

Also applies to: 179-179

🧰 Tools
🪛 LanguageTool

[uncategorized] ~171-~171: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...AP OnLine Analytical Processing (OLAP) are describing database workloads that most...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)

🪛 GitHub Check: vale

[warning] 171-171:
[vale] reported by reviewdog 🐶
[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.

Raw Output:
{"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "glossary.md", "range": {"start": {"line": 171, "column": 193}}}, "severity": "INFO"}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 93e38d1 and 956a053.

📒 Files selected for processing (1)
  • glossary.md (11 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
glossary.md (1)

Pattern **/*.md: As a technical writer in the database area, you need to first learn the changes in this PR, and then improve the content to make it clear, concise, and easy to understand in English according to your database knowledge.

🪛 LanguageTool
glossary.md

[uncategorized] ~49-~49: Possible missing comma found.
Context: ...rmance. ### CF CF is short for Column Family as used by RocksDB / TiKV. ### Coalesc...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~61-~61: A punctuation mark might be missing here.
Context: ...E) is part of the SQL standard and uses [WITH](/sql-statements/sql-statement-wit...

(AI_EN_LECTOR_MISSING_PUNCTUATION)


[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...the latest features but doesn't provide long term support. See [TiDB Versioning](/release...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~86-~86: You might be missing the article “the” here.
Context: ...backups and standby clusters. ### DXF Distributed eXecution Framework (DXF) is the framew...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[misspelling] ~107-~107: Did you mean the phrasal verb “clean up” instead of the noun ‘cleanup’?
Context: ...rbage Collection (GC) is the process to cleanup unused resources. See [GC](/garbage-col...

(CLEAN_UP)


[misspelling] ~123-~123: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...DS Instance Metadata Service (IMDS) is a AWS service that can be used to manage ...

(EN_A_VS_AN)


[style] ~137-~137: Consider replacing this phrase with the adverb “securely” to avoid wordiness.
Context: ...he storage and retrieval of secret keys in a secure way. Examples of this are the AWS KMS, GCP ...

(IN_A_X_MANNER)


[uncategorized] ~141-~141: Possible missing preposition found.
Context: ...vices. ### KV Key-Value (KV) is a way storing information that allows easy store and ...

(AI_EN_LECTOR_MISSING_PREPOSITION)


[uncategorized] ~150-~150: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...uthentication-plugin-status). ### LTS Long Term Support (LTS) are software versions tha...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~171-~171: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...AP OnLine Analytical Processing (OLAP) are describing database workloads that most...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)


[uncategorized] ~179-~179: This verb may not be in the correct tense. Consider changing the tense to fit the context better.
Context: ...P OnLine Transaction Processing (OLTP) are describing database workloads that mostly deal wit...

(AI_EN_LECTOR_REPLACEMENT_VERB_TENSE)


[duplication] ~183-~183: Possible typo: you repeated a word
Context: ...a situation where a system fails due to a a lack of available memory. See [Troubles...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~216-~216: Possible missing article found.
Context: ...arners is vastly different from that of leader. Followers in pending cannot be elected...

(AI_HYDRA_LEO_MISSING_A)


[uncategorized] ~216-~216: You might be missing the article “the” here.
Context: ... state that a peer ceases to respond to leader for a long time, which usually means th...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[style] ~219-~219: This phrase is redundant. Consider writing “Point” or “time”.
Context: ...r isolated from the network. ### PiTR Point in Time Recovery (PiTR) is a database feature t...

(MOMENT_IN_TIME)


[style] ~220-~220: This phrase is redundant. Consider writing “point” or “time”.
Context: ...llows the user to restore to a specific point in time (for example just before an accidental ...

(MOMENT_IN_TIME)


[uncategorized] ~220-~220: Possible missing comma found.
Context: ...estore to a specific point in time (for example just before an accidental DELETE stat...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~308-~308: You might be missing the article “the” here.
Context: ...ls, see TSO. ## U ### URI Uniform Resource Identifier (URI) is a uniform ...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~312-~312: You might be missing the article “the” here.
Context: ...ipedia for more information. ### UUID Universally Unique Identifier (UUID) is a 128-bit (...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 GitHub Check: vale
glossary.md

[failure] 137-137:
[vale] reported by reviewdog 🐶
[Vale.Avoid] Avoid using 'GCP'.

Raw Output:
{"message": "[Vale.Avoid] Avoid using 'GCP'.", "location": {"path": "glossary.md", "range": {"start": {"line": 137, "column": 129}}}, "severity": "ERROR"}


[warning] 171-171:
[vale] reported by reviewdog 🐶
[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.

Raw Output:
{"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "glossary.md", "range": {"start": {"line": 171, "column": 193}}}, "severity": "INFO"}

🔇 Additional comments (3)
glossary.md (3)

33-35: LGTM!

The definition is clear, concise, and includes proper documentation reference.


115-118: LGTM!

The HTAP definition is comprehensive, technically accurate, and includes both internal implementation details and an external reference.


Line range hint 1-313: Verify documentation links

Let's verify that all documentation links are valid and accessible.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~61-~61: A punctuation mark might be missing here.
Context: ...E) is part of the SQL standard and uses [WITH](/sql-statements/sql-statement-wit...

(AI_EN_LECTOR_MISSING_PUNCTUATION)


[uncategorized] ~79-~79: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...the latest features but doesn't provide long term support. See [TiDB Versioning](/release...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~86-~86: You might be missing the article “the” here.
Context: ...backups and standby clusters. ### DXF Distributed eXecution Framework (DXF) is the framew...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[misspelling] ~107-~107: Did you mean the phrasal verb “clean up” instead of the noun ‘cleanup’?
Context: ...rbage Collection (GC) is the process to cleanup unused resources. See [GC](/garbage-col...

(CLEAN_UP)


[misspelling] ~123-~123: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ...DS Instance Metadata Service (IMDS) is a AWS service that can be used to manage ...

(EN_A_VS_AN)


### OLAP

OnLine Analytical Processing (OLAP) are describing database workloads that mostly deal with analytical workloads like reporting. The characteristics of this is read heavy queries that process many rows.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix grammar and style issues

Several grammar issues need to be addressed:

-OnLine Analytical Processing (OLAP) are describing database workloads that mostly deal with analytical workloads like reporting.
+Online Analytical Processing (OLAP) describes database workloads that mostly deal with analytical tasks like reporting.

-OnLine Transaction Processing (OLTP) are describing database workloads that mostly deal with transactional workloads
+Online Transaction Processing (OLTP) describes database workloads that mostly deal with transactional operations

-Out of Memory (OOM) is a situation where a system fails due to a a lack of available memory.
+Out of Memory (OOM) is a situation where a system fails due to a lack of available memory.

-Followers in pending cannot be elected as leader.
+Followers in pending state cannot be elected as the leader.

-Uniform Resource Identifier (URI) is a uniform way
+A Uniform Resource Identifier (URI) is a uniform way

-Universally Unique Identifier (UUID) is a 128-bit
+A Universally Unique Identifier (UUID) is a 128-bit

Also applies to: 179-179, 183-183, 216-216, 308-308, 312-312

🧰 Tools
🪛 LanguageTool

[uncategorized] ~171-~171: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...AP OnLine Analytical Processing (OLAP) are describing database workloads that most...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)

🪛 GitHub Check: vale

[warning] 171-171:
[vale] reported by reviewdog 🐶
[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.

Raw Output:
{"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "glossary.md", "range": {"start": {"line": 171, "column": 193}}}, "severity": "INFO"}

Comment on lines +85 to +88
### DXF

Distributed eXecution Framework (DXF) is the framework used by TiDB to speedup index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix punctuation and verify documentation link

The entry is missing proper punctuation and needs link verification.

 ### DXF
 
-Distributed eXecution Framework (DXF) is the framework used by TiDB to speedup index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details
+Distributed eXecution Framework (DXF) is the framework used by TiDB to speed up index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### DXF
Distributed eXecution Framework (DXF) is the framework used by TiDB to speedup index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details
### DXF
Distributed eXecution Framework (DXF) is the framework used by TiDB to speed up index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~86-~86: You might be missing the article “the” here.
Context: ...backups and standby clusters. ### DXF Distributed eXecution Framework (DXF) is the framew...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

Comment on lines +47 to +50
### CF

CF is short for Column Family as used by RocksDB / TiKV.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance the Column Family definition

The current definition is too brief. Consider expanding it to explain what a Column Family is and its role in RocksDB/TiKV.

 ### CF
 
-CF is short for Column Family as used by RocksDB / TiKV.
+CF is short for Column Family, a feature in RocksDB and TiKV that allows storing different types of data separately. Column Families provide logical separation of data, enabling different storage configurations and independent compaction for different types of data within the same database.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### CF
CF is short for Column Family as used by RocksDB / TiKV.
### CF
CF is short for Column Family, a feature in RocksDB and TiKV that allows storing different types of data separately. Column Families provide logical separation of data, enabling different storage configurations and independent compaction for different types of data within the same database.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~49-~49: Possible missing comma found.
Context: ...rmance. ### CF CF is short for Column Family as used by RocksDB / TiKV. ### Coalesc...

(AI_HYDRA_LEO_MISSING_COMMA)

@qiancai
Copy link
Owner Author

qiancai commented Apr 18, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds several new terms to the glossary, enhancing the documentation. The additions seem relevant and well-defined. However, I have a few suggestions to improve clarity and consistency, referencing the provided style guide.

Merge Readiness

The changes are valuable additions to the glossary. However, addressing the suggestions regarding clarity, consistency, and style guide adherence would further improve the quality of the documentation. I recommend addressing the review comments before merging. I am unable to approve this pull request, and users should have others review and approve this code before merging.

Comment on lines +33 to +35
### BR

BR is the Backup and Restore tool for TiDB. See [BR Overview](/br/backup-and-restore-overview.md) for more information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider rephrasing this definition to be more descriptive of what BR does, rather than just stating what it is. For example, "BR is a tool used to back up and restore TiDB clusters, enabling disaster recovery and data migration."

Also, consider using active voice: "You can use BR to..." 1

Style Guide References

Suggested change
### BR
BR is the Backup and Restore tool for TiDB. See [BR Overview](/br/backup-and-restore-overview.md) for more information.
BR is a tool used to back up and restore TiDB clusters, enabling disaster recovery and data migration. You can use BR to...

Footnotes

  1. Use the active voice whenever possible. (link)

Comment on lines +81 to +83
### DR

Disaster Recovery (DR) describes solutions that can be used to recover from a disaster in the future. This includes things like backups and standby clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This definition could be more descriptive. Consider rephrasing to explain the purpose of Disaster Recovery in the context of TiDB. For example, "Disaster Recovery (DR) encompasses strategies and solutions for restoring TiDB clusters to a functional state after a disruptive event, including data backups, replication, and failover mechanisms."

Suggested change
### DR
Disaster Recovery (DR) describes solutions that can be used to recover from a disaster in the future. This includes things like backups and standby clusters.
Disaster Recovery (DR) encompasses strategies and solutions for restoring TiDB clusters to a functional state after a disruptive event, including data backups, replication, and failover mechanisms.

Comment on lines +47 to +49
### CF

CF is short for Column Family as used by RocksDB / TiKV.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

It might be helpful to expand on the use of Column Families in RocksDB/TiKV for readers unfamiliar with the concept. For example, "CF refers to Column Family, a feature used by RocksDB and TiKV to logically group key-value pairs for better performance and storage management."

Suggested change
### CF
CF is short for Column Family as used by RocksDB / TiKV.
CF refers to Column Family, a feature used by RocksDB and TiKV to logically group key-value pairs for better performance and storage management.


### KMS

Key Management Service (KMS) allows the storage and retrieval of secret keys in a secure way. Examples of this are the AWS KMS, GCP KMS and HashiCorp Vault. Various TiDB components can use this to manage the keys that are used for storage encryption and related services.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider adding a sentence about how KMS is used in TiDB, e.g. "TiDB uses KMS to manage encryption keys for data at rest."

Suggested change
Key Management Service (KMS) allows the storage and retrieval of secret keys in a secure way. Examples of this are the AWS KMS, GCP KMS and HashiCorp Vault. Various TiDB components can use this to manage the keys that are used for storage encryption and related services.
Key Management Service (KMS) allows the storage and retrieval of secret keys in a secure way. Examples of this are the AWS KMS, GCP KMS and HashiCorp Vault. TiDB uses KMS to manage encryption keys for data at rest.


### OLAP

OnLine Analytical Processing (OLAP) are describing database workloads that mostly deal with analytical workloads like reporting. The characteristics of this is read heavy queries that process many rows.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider rephrasing this definition to be more descriptive of what OLAP workloads are in the context of TiDB. For example, "OnLine Analytical Processing (OLAP) refers to database workloads that primarily involve complex analytical queries, such as reporting and data warehousing, often characterized by read-heavy operations and large data scans."

Style Guide References

Suggested change
OnLine Analytical Processing (OLAP) are describing database workloads that mostly deal with analytical workloads like reporting. The characteristics of this is read heavy queries that process many rows.
OnLine Analytical Processing (OLAP) refers to database workloads that primarily involve complex analytical queries, such as reporting and data warehousing, often characterized by read-heavy operations and large data scans.


### RPC

Remote Procedure Call (RPC) is a way for software components to communicate. In a TiDB cluster gRPC standard is used for communication between different components such as TiDB, TiKV and TiFlash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider rephrasing this sentence to use active voice: "TiDB uses gRPC for communication between its components."1

Style Guide References

Suggested change
Remote Procedure Call (RPC) is a way for software components to communicate. In a TiDB cluster gRPC standard is used for communication between different components such as TiDB, TiKV and TiFlash.
Remote Procedure Call (RPC) is a way for software components to communicate. TiDB uses gRPC for communication between its components such as TiDB, TiKV and TiFlash.

Footnotes

  1. Use the active voice whenever possible. (link)


### SST

Static Sorted Table, Sorted String Table or Sorted Sequence Table (SST) is the file storage format of RocksDB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider adding a brief explanation of what RocksDB is, e.g. "RocksDB is an embedded key-value store used by TiKV."

Suggested change
Static Sorted Table, Sorted String Table or Sorted Sequence Table (SST) is the file storage format of RocksDB.
Static Sorted Table, Sorted String Table or Sorted Sequence Table (SST) is the file storage format of RocksDB, an embedded key-value store used by TiKV.


### TPS

Transactions Per Second (TPS) is a performance metric of a database.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider rephrasing this definition to be more descriptive of what TPS measures. For example, "Transactions Per Second (TPS) is a key performance indicator (KPI) that measures the number of transactions a database system can process per second."

Suggested change
Transactions Per Second (TPS) is a performance metric of a database.
Transactions Per Second (TPS) is a key performance indicator (KPI) that measures the number of transactions a database system can process per second.


### URI

Uniform Resource Identifier (URI) is a uniform way of describing a resource. See [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) on Wikipedia for more information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider rephrasing this definition to be more descriptive of what URI is. For example, "Uniform Resource Identifier (URI) is a string of characters that identifies a resource, enabling uniform identification of resources."

Suggested change
Uniform Resource Identifier (URI) is a uniform way of describing a resource. See [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) on Wikipedia for more information.
Uniform Resource Identifier (URI) is a string of characters that identifies a resource, enabling uniform identification of resources.

Comment on lines +312 to +313

Universally Unique Identifier (UUID) is a 128-bit (16 byte) generated ID that can be used to identify records in a database. See [UUID](/best-practices/uuid.md) for more information on how UUID's are used in TiDB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Consider rephrasing this definition to be more descriptive of what UUID is. For example, "Universally Unique Identifier (UUID) is a 128-bit identifier used to uniquely identify information in computer systems, ensuring uniqueness across space and time."

Suggested change
Universally Unique Identifier (UUID) is a 128-bit (16 byte) generated ID that can be used to identify records in a database. See [UUID](/best-practices/uuid.md) for more information on how UUID's are used in TiDB.
Universally Unique Identifier (UUID) is a 128-bit identifier used to uniquely identify information in computer systems, ensuring uniqueness across space and time.

Repository owner deleted a comment from github-actions bot Apr 18, 2025
@qiancai
Copy link
Owner Author

qiancai commented Apr 18, 2025

/bot-review


### Bucket

A [Region](#regionpeerraft-group) is logically divided into several small ranges called bucket. TiKV collects query statistics by buckets and reports the bucket status to PD. For details, see the [Bucket design doc](https://github.com/tikv/rfcs/blob/master/text/0082-dynamic-size-region.md#bucket).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence is not clear enough. It is recommended to clarify the relationship between Region and bucket, and to supplement the explanation of how buckets are used.

Suggested change
A [Region](#regionpeerraft-group) is logically divided into several small ranges called bucket. TiKV collects query statistics by buckets and reports the bucket status to PD. For details, see the [Bucket design doc](https://github.com/tikv/rfcs/blob/master/text/0082-dynamic-size-region.md#bucket).
A [Region](#regionpeerraft-group) is logically divided into several small ranges called buckets. TiKV collects query statistics by these buckets and reports the bucket status to PD. This helps in optimizing query performance and load balancing. For details, see the [Bucket design doc](https://github.com/tikv/rfcs/blob/master/text/0082-dynamic-size-region.md#bucket).


### CF

CF is short for Column Family as used by RocksDB / TiKV.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of CF is too brief and lacks context. It is recommended to provide more information about its role and significance in RocksDB/TiKV.

Suggested change
CF is short for Column Family as used by RocksDB / TiKV.
CF is short for Column Family as used by RocksDB / TiKV. In the context of RocksDB and TiKV, a Column Family is a logical separation within a database that allows for different settings and optimizations for different types of data. This feature enables more efficient data management and retrieval.


### DM

Data Migration is the tool that allows MySQL to TiDB migration by reading data from a source instance and applying it to a target MySQL instance. See [DM Overview](/dm/dm-overview.md) for more information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence is unclear and could be improved for better readability. It is recommended to clarify the process of data migration from MySQL to TiDB.

Suggested change
Data Migration is the tool that allows MySQL to TiDB migration by reading data from a source instance and applying it to a target MySQL instance. See [DM Overview](/dm/dm-overview.md) for more information.
Data Migration (DM) is the tool that facilitates the migration from MySQL to TiDB by reading data from a source MySQL instance and applying it to a target TiDB instance. See [DM Overview](/dm/dm-overview.md) for more information.


### DXF

Distributed eXecution Framework (DXF) is the framework used by TiDB to speedup index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence is missing a period at the end.

Suggested change
Distributed eXecution Framework (DXF) is the framework used by TiDB to speedup index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details
Distributed eXecution Framework (DXF) is the framework used by TiDB to speed up index creation and data import by distributing tasks over all available resources. See [DXF Introduction](/tidb-distributed-execution-framework.md) for more details.


### GTID

Global Transactions ID's (GTIDs) are used by recent MySQL versions binary log to indicate what transactions have been replicated and which have not. This information can be used by DM.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence contains a grammatical error. "ID's" should be "IDs".

Suggested change
Global Transactions ID's (GTIDs) are used by recent MySQL versions binary log to indicate what transactions have been replicated and which have not. This information can be used by DM.
Global Transaction IDs (GTIDs) are used by recent MySQL versions' binary log to indicate what transactions have been replicated and which have not. This information can be used by DM.


### RU

Request Unit (RU) is used in TiDB to describe the unit for the resource usage. This is used with [Resource Control](/tidb-resource-control.md) to manage resource usage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence is missing a comma after 'usage'.

Suggested change
Request Unit (RU) is used in TiDB to describe the unit for the resource usage. This is used with [Resource Control](/tidb-resource-control.md) to manage resource usage.
Request Unit (RU) is used in TiDB to describe the unit for the resource usage. This is used with [Resource Control](/tidb-resource-control.md) to manage resource usage.


### SST

Static Sorted Table, Sorted String Table or Sorted Sequence Table (SST) is the file storage format of RocksDB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of SST is unclear and lacks context. It is recommended to provide a more detailed explanation of what SST is and its role in RocksDB.

Suggested change
Static Sorted Table, Sorted String Table or Sorted Sequence Table (SST) is the file storage format of RocksDB.
Static Sorted Table, Sorted String Table, or Sorted Sequence Table (SST) is a file storage format used by RocksDB. It organizes data in a sorted manner to optimize read and write operations, enhancing the performance of the database.


### TPS

Transactions Per Second (TPS) is a performance metric of a database.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of TPS is too brief and lacks context. Consider providing more details about its significance in database performance.

Suggested change
Transactions Per Second (TPS) is a performance metric of a database.
Transactions Per Second (TPS) is a performance metric of a database that measures the number of transactions a system can process in one second. It is a critical indicator of the system's throughput and efficiency.


### URI

Uniform Resource Identifier (URI) is a uniform way of describing a resource. See [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) on Wikipedia for more information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation of URI could be expanded to provide more context about its use in TiDB.

Suggested change
Uniform Resource Identifier (URI) is a uniform way of describing a resource. See [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) on Wikipedia for more information.
Uniform Resource Identifier (URI) is a uniform way of describing a resource, which is used in TiDB to identify and access various resources within the system. See [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) on Wikipedia for more information.


### UUID

Universally Unique Identifier (UUID) is a 128-bit (16 byte) generated ID that can be used to identify records in a database. See [UUID](/best-practices/uuid.md) for more information on how UUID's are used in TiDB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation of UUID should clarify its importance and usage in TiDB.

Suggested change
Universally Unique Identifier (UUID) is a 128-bit (16 byte) generated ID that can be used to identify records in a database. See [UUID](/best-practices/uuid.md) for more information on how UUID's are used in TiDB.
Universally Unique Identifier (UUID) is a 128-bit (16 byte) generated ID that can be used to uniquely identify records in a database, ensuring data integrity and consistency. See [UUID](/best-practices/uuid.md) for more information on how UUID's are used in TiDB.

Copy link

✅ AI review completed, 21 comments generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant