Skip to content

Commit b488491

Browse files
authored
Merge pull request #3549 from Blargian/autogenerate_mergetree_settings
Autogenerate MergeTree settings
2 parents 484399b + 9e01403 commit b488491

File tree

9 files changed

+32
-16
lines changed

9 files changed

+32
-16
lines changed

docs/cloud/bestpractices/avoidoptimizefinal.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ It is important to note that using this optimization will force a rewrite of a p
2020
even if merging to a single part has already occurred.
2121

2222
Additionally, use of the `OPTIMIZE TABLE ... FINAL` query may disregard
23-
setting [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) which controls the maximum size of parts
23+
setting [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) which controls the maximum size of parts
2424
that ClickHouse will typically merge by itself in the background.
2525

26-
The [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) setting is by default set to 150 GB.
26+
The [`max_bytes_to_merge_at_max_space_in_pool`](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) setting is by default set to 150 GB.
2727
When running `OPTIMIZE TABLE ... FINAL`,
2828
the steps outlined above will be performed resulting in a single part after merge.
2929
This remaining single part could exceed the 150 GB specified by the default of this setting.

docs/guides/developer/deduplicating-inserts-on-retries.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ When an insert is retried, ClickHouse tries to determine whether the data has al
1515

1616
**Only `*MergeTree` engines support deduplication on insertion.**
1717

18-
For `*ReplicatedMergeTree` engines, insert deduplication is enabled by default and is controlled by the [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated-deduplication-window) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated-deduplication-window-seconds) settings. For non-replicated `*MergeTree` engines, deduplication is controlled by the [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non-replicated-deduplication-window) setting.
18+
For `*ReplicatedMergeTree` engines, insert deduplication is enabled by default and is controlled by the [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window) and [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds) settings. For non-replicated `*MergeTree` engines, deduplication is controlled by the [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window) setting.
1919

2020
The settings above determine the parameters of the deduplication log for a table. The deduplication log stores a finite number of `block_id`s, which determine how deduplication works (see below).
2121

@@ -41,9 +41,9 @@ When a table has one or more materialized views, the inserted data is also inser
4141

4242
You can control this process using the following settings for the source table:
4343

44-
- [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated-deduplication-window)
45-
- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated-deduplication-window-seconds)
46-
- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non-replicated-deduplication-window)
44+
- [`replicated_deduplication_window`](/operations/settings/merge-tree-settings#replicated_deduplication_window)
45+
- [`replicated_deduplication_window_seconds`](/operations/settings/merge-tree-settings#replicated_deduplication_window_seconds)
46+
- [`non_replicated_deduplication_window`](/operations/settings/merge-tree-settings#non_replicated_deduplication_window)
4747

4848
You can also use the user profile setting [`deduplicate_blocks_in_dependent_materialized_views`](/operations/settings/settings#deduplicate_blocks_in_dependent_materialized_views).
4949

docs/integrations/data-ingestion/s3/performance.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ Note that the `min_insert_block_size_bytes` value denotes the uncompressed in-me
6060

6161
#### Be aware of merges {#be-aware-of-merges}
6262

63-
The smaller the configured insert block size is, the more initial parts get created for a large data load, and the more background part merges are executed concurrently with the data ingestion. This can cause resource contention (CPU and memory) and require additional time (for reaching a [healthy](/operations/settings/merge-tree-settings#parts-to-throw-insert) (3000) number of parts) after the ingestion is finished.
63+
The smaller the configured insert block size is, the more initial parts get created for a large data load, and the more background part merges are executed concurrently with the data ingestion. This can cause resource contention (CPU and memory) and require additional time (for reaching a [healthy](/operations/settings/merge-tree-settings#parts_to_throw_insert) (3000) number of parts) after the ingestion is finished.
6464

6565
:::important
66-
ClickHouse query performance will be negatively impacted if the part count exceeds the [recommended limits](/operations/settings/merge-tree-settings#parts-to-throw-insert).
66+
ClickHouse query performance will be negatively impacted if the part count exceeds the [recommended limits](/operations/settings/merge-tree-settings#parts_to_throw_insert).
6767
:::
6868

69-
ClickHouse will continuously [merge parts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) into larger parts until they [reach](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) a compressed size of ~150 GiB. This diagram shows how a ClickHouse server merges parts:
69+
ClickHouse will continuously [merge parts](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance) into larger parts until they [reach](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) a compressed size of ~150 GiB. This diagram shows how a ClickHouse server merges parts:
7070

7171
<Image img={Merges} size="lg" border alt="Background merges in ClickHouse" />
7272

@@ -84,7 +84,7 @@ Go to ①
8484

8585
Note that [increasing](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part1#hardware-size) the number of CPU cores and the size of RAM increases the background merge throughput.
8686

87-
Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) number of minutes. Over time, this creates a tree of merged parts (hence the name [`MergeTree`](/engines/table-engines/mergetree-family) table).
87+
Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) number of minutes. Over time, this creates a tree of merged parts (hence the name [`MergeTree`](/engines/table-engines/mergetree-family) table).
8888

8989
### Insert Parallelism {#insert-parallelism}
9090

docs/managing-data/core-concepts/merges.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,15 +28,15 @@ ClickHouse [is fast](/concepts/why-clickhouse-is-so-fast) not just for queries b
2828

2929
This makes data writes lightweight and [highly efficient](/concepts/why-clickhouse-is-so-fast#storage-layer-concurrent-inserts-are-isolated-from-each-other).
3030

31-
To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool).
31+
To control the number of parts per table and implement ② above, ClickHouse continuously merges ([per partition](/partitions#per-partition-merges)) smaller parts into larger ones in the background until they reach a compressed size of approximately [~150 GB](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool).
3232

3333
The following diagram sketches this background merge process:
3434

3535
<Image img={merges_01} size="lg" alt='PART MERGES'/>
3636

3737
<br/>
3838

39-
The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
39+
The `merge level` of a part is incremented by one with each additional merge. A level of `0` means the part is new and has not been merged yet. Parts that were merged into larger parts are marked as [inactive](/operations/system-tables/parts) and finally deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time (8 minutes by default). Over time, this creates a **tree** of merged parts. Hence the name [merge tree](/engines/table-engines/mergetree-family) table.
4040

4141
## Monitoring merges {#monitoring-merges}
4242

docs/managing-data/core-concepts/parts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Data parts are self-contained, including all metadata needed to interpret their
5555

5656
## Part merges {#part-merges}
5757

58-
To manage the number of parts per table, a [background merge](/merges) job periodically combines smaller parts into larger ones until they reach a [configurable](/operations/settings/merge-tree-settings#max-bytes-to-merge-at-max-space-in-pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/operations/settings/merge-tree-settings#old-parts-lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
58+
To manage the number of parts per table, a [background merge](/merges) job periodically combines smaller parts into larger ones until they reach a [configurable](/operations/settings/merge-tree-settings#max_bytes_to_merge_at_max_space_in_pool) compressed size (typically ~150 GB). Merged parts are marked as inactive and deleted after a [configurable](/operations/settings/merge-tree-settings#old_parts_lifetime) time interval. Over time, this process creates a hierarchical structure of merged parts, which is why it’s called a MergeTree table:
5959

6060
<Image img={merges} size="lg"/>
6161

docs/migrations/bigquery/migrating-to-clickhouse-cloud.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ Users should consider partitioning a data management technique. It is ideal when
242242

243243
Important: Ensure your partitioning key expression does not result in a high cardinality set i.e. creating more than 100 partitions should be avoided. For example, do not partition your data by high cardinality columns such as client identifiers or names. Instead, make a client identifier or name the first column in the `ORDER BY` expression.
244244

245-
> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts-to-throw-insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
245+
> Internally, ClickHouse [creates parts](/guides/best-practices/sparse-primary-indexes#clickhouse-index-design) for inserted data. As more data is inserted, the number of parts increases. In order to prevent an excessively high number of parts, which will degrade query performance (because there are more files to read), parts are merged together in a background asynchronous process. If the number of parts exceeds a [pre-configured limit](/operations/settings/merge-tree-settings#parts_to_throw_insert), then ClickHouse will throw an exception on insert as a ["too many parts" error](/knowledgebase/exception-too-many-parts). This should not happen under normal operation and only occurs if ClickHouse is misconfigured or used incorrectly e.g. many small inserts. Since parts are created per partition in isolation, increasing the number of partitions causes the number of parts to increase i.e. it is a multiple of the number of partitions. High cardinality partitioning keys can, therefore, cause this error and should be avoided.
246246

247247
## Materialized views vs projections {#materialized-views-vs-projections}
248248

docusaurus.config.en.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ const config = {
5959
onBrokenLinks: "throw",
6060
onBrokenMarkdownLinks: "warn",
6161
onDuplicateRoutes: "throw",
62-
onBrokenAnchors: "throw",
62+
onBrokenAnchors: "warn",
6363
favicon: "img/docs_favicon.ico",
6464
organizationName: "ClickHouse",
6565
trailingSlash: false,

scripts/settings/autogenerate-settings.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,11 +54,12 @@ done
5454
# move across files to where they need to be
5555
mv settings-formats.md "$root/docs/operations/settings" || { echo "Failed to move generated settings-format.md"; exit 1; }
5656
mv settings.md "$root/docs/operations/settings" || { echo "Failed to move generated settings.md"; exit 1; }
57+
cat generated_merge_tree_settings.md >> "$root/docs/operations/settings/merge-tree-settings.md" || { echo "Failed to append MergeTree settings.md"; exit 1; }
5758
mv server_settings.md "$root/docs/operations/server-configuration-parameters/settings.md" || { echo "Failed to move generated server_settings.md"; exit 1; }
5859

5960
echo "[$SCRIPT_NAME] Auto-generation of settings markdown pages completed successfully"
6061

6162
# perform cleanup
62-
rm -rf "$tmp_dir"/{settings-formats.md,settings.md,FormatFactorySettings.h,Settings.cpp,clickhouse}
63+
rm -rf "$tmp_dir"/{settings-formats.md,settings.md,FormatFactorySettings.h,Settings.cpp,generated_merge_tree_settings.md,clickhouse}
6364

6465
echo "[$SCRIPT_NAME] Autogenerating settings completed"
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
WITH
2+
merge_tree_settings AS
3+
(
4+
SELECT format(
5+
'## {} {} \n{}\n{}{}',
6+
name,
7+
'{#'||name||'}',
8+
multiIf(tier == 'Experimental', '\n<ExperimentalBadge/>\n', tier == 'Beta', '\n<BetaBadge/>\n', ''),
9+
if(type != '' AND default != '', format('|Type|Default|\n|---|---|\n|`{}`|`{}`|\n\n',type, default), ''),
10+
replaceRegexpAll(description, '(?m)(^[ \t]+|[ \t]+$)', '')
11+
)
12+
FROM system.merge_tree_settings ORDER BY name
13+
)
14+
SELECT * FROM merge_tree_settings
15+
INTO OUTFILE 'generated_merge_tree_settings.md' TRUNCATE FORMAT LineAsString

0 commit comments

Comments
 (0)