Skip to content

Commit 8491969

Browse files
committed
fix(clustered): Closes influxdata/DAR#472. Catalog terminology is confusing, needs scaling recommendations:
- Distinguish between Catalog store (Postgres db) and Catalog service (API and cache for the store). - Add scaling recommendations for each. - Resolve conflicting scaling info, remove duplicated scaling info from storage-engine.md.
1 parent 38c9b6d commit 8491969

File tree

10 files changed

+93
-129
lines changed

10 files changed

+93
-129
lines changed

content/influxdb3/clustered/admin/backup-restore.md

+20-22
Original file line numberDiff line numberDiff line change
@@ -12,29 +12,29 @@ weight: 105
1212
influxdb3/clustered/tags: [backup, restore]
1313
---
1414

15-
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog that
15+
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog store that
1616
you can use to restore your cluster to a previous state. The snapshotting
1717
functionality is optional and is disabled by default.
1818
Enable snapshots to ensure you can recover
1919
in case of emergency.
2020

2121
With InfluxDB Clustered snapshots enabled, each hour, InfluxDB uses the `pg_dump`
22-
utility included with the InfluxDB Garbage Collector to export an SQL blob or
23-
“snapshot” from the InfluxDB Catalog and store it in the object store.
24-
The Catalog is a PostgreSQL-compatible relational database that stores metadata
22+
utility included with the InfluxDB Garbage collector to export an SQL blob or
23+
“snapshot” from the InfluxDB Catalog store to the Object store.
24+
The Catalog store is a PostgreSQL-compatible relational database that stores metadata
2525
for your time series data, such as schema data types, Parquet file locations, and more.
2626

27-
The Catalog snapshots act as recovery points for your InfluxDB cluster that
28-
reference all Parquet files that existed in the object store at the time of the
29-
snapshot. When a snapshot is restored to the Catalog, the Compactor
27+
The Catalog store snapshots act as recovery points for your InfluxDB cluster that
28+
reference all Parquet files that existed in the Object store at the time of the
29+
snapshot. When a snapshot is restored to the Catalog store, the Compactor
3030
[soft deletes](#soft-delete)” any Parquet files not listed in the snapshot.
3131

3232
> [!Note]
3333
> InfluxDB won't [hard delete](#hard-delete) Parquet files listed in _any_ hourly or daily snapshot.
3434
>
3535
> For example, if you have Parquet files A, B, C, and D, and you restore to a
3636
> snapshot that includes B and C, but not A and D, then A and D are soft-deleted, but remain in object
37-
> storage until they are no longer referenced in any Catalog snapshot.
37+
> storage until they are no longer referenced in any Catalog store snapshot.
3838
- [Soft delete](#soft-delete)
3939
- [Hard delete](#hard-delete)
4040
- [Recovery Point Objective (RPO)](#recovery-point-objective-rpo)
@@ -75,8 +75,8 @@ The InfluxDB Clustered snapshot strategy RPO allows for the following maximum da
7575
## Recovery Time Objective (RTO)
7676

7777
RTO is the maximum amount of downtime allowed for an InfluxDB cluster after a failure.
78-
RTO varies depending on the size of your Catalog database, network speeds
79-
between the client machine and the Catalog database, cluster load, the status
78+
RTO varies depending on the size of your Catalog store, network speeds
79+
between the client machine and the Catalog store, cluster load, the status
8080
of your underlying hosting provider, and other factors.
8181

8282
## Data written just before a snapshot may not be present after restoring
@@ -94,14 +94,14 @@ present after restoring to that snapshot.
9494
### Automate object synchronization to an external S3-compatible bucket
9595

9696
Syncing objects to an external S3-compatible bucket ensures an up-to-date backup
97-
in case your object store becomes unavailable. Recovery point snapshots only
98-
back up the InfluxDB Catalog. If data referenced in a Catalog snapshot does not
99-
exist in the object store, the recovery process does not restore the missing data.
97+
in case your Object store becomes unavailable. Recovery point snapshots only
98+
back up the InfluxDB Catalog store. If data referenced in a Catalog store snapshot does not
99+
exist in the Object store, the recovery process does not restore the missing data.
100100

101101
### Enable short-term object versioning
102102

103103
If your object storage provider supports it, consider enabling short-term
104-
object versioning on your object store--for example, 1-2 days to protect against errant writes or deleted objects.
104+
object versioning on your Object store--for example, 1-2 days to protect against errant writes or deleted objects.
105105
With object versioning enabled, as objects are updated, the object store
106106
retains distinct versions of each update that can be used to “rollback” newly
107107
written or updated Parquet files to previous versions.
@@ -140,7 +140,7 @@ spec:
140140
141141
#### INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES
142142
143-
Enable hourly Catalog snapshotting. The default is `'false'`. Set to `'true'`:
143+
Enable hourly Catalog store snapshotting. The default is `'false'`. Set to `'true'`:
144144
145145
```yaml
146146
INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
@@ -217,22 +217,20 @@ written on or around the beginning of the next hour.
217217
## Restore to a recovery point
218218

219219
Use the following process to restore your InfluxDB cluster to a recovery point
220-
using Catalog snapshots:
220+
using Catalog store snapshots:
221221

222222
1. **Install prerequisites:**
223223

224224
- `kubectl` CLI for managing your Kubernetes deployment.
225-
- `psql` CLI to interact with the PostgreSQL-compatible Catalog database with
226-
the appropriate Data Source Name (DSN) and connection credentials.
227-
- A client to interact with your InfluxDB cluster’s object store.
228-
Supported clients depend on your object storage provider.
225+
- `psql` CLI configured with your Data Source Name and credentials for interacting with the PostgreSQL-compatible Catalog store database.
226+
- A client from your object storage provider for interacting with your InfluxDB cluster's Object store.
229227

230228
2. **Retrieve the recovery point snapshot from your object store.**
231229

232230
InfluxDB Clustered stores hourly and daily snapshots in the
233231
`/catalog_backup_file_lists` path in object storage. Download the snapshot
234-
that you would like to use as the recovery point. If your primary object
235-
store is unavailable, download the snapshot from your replicated object store.
232+
that you would like to use as the recovery point. If your primary Object
233+
store is unavailable, download the snapshot from your replicated Object store.
236234

237235
> [!Important]
238236
> When creating and storing a snapshot, the last artifact created is the

content/influxdb3/clustered/admin/databases/delete.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@ Once a database is deleted, data stored in that database cannot be recovered.
4141

4242
After a database is deleted, you cannot reuse the same name for a new database.
4343

44-
#### Never directly modify the Catalog
44+
#### Never directly modify the Catalog store
4545

46-
Never directly modify the [PostgreSQL-compatible Catalog](/influxdb3/clustered/reference/internals/storage-engine/#catalog).
47-
Doing so will result in an undefined state for various components and may lead to data loss and crashes.
46+
Never directly modify the [PostgreSQL-compatible Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store).
47+
Doing so will result in an undefined state for various components and may lead to data loss and crashes.
4848
{{% /warn %}}

content/influxdb3/clustered/admin/scale-cluster.md

+35-27
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,12 @@ resources available to each component.
2222
- [Scaling strategies](#scaling-strategies)
2323
- [Vertical scaling](#vertical-scaling)
2424
- [Horizontal scaling](#horizontal-scaling)
25+
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
2526
- [Scale components in your cluster](#scale-components-in-your-cluster)
2627
- [Horizontally scale a component](#horizontally-scale-a-component)
2728
- [Vertically scale a component](#vertically-scale-a-component)
2829
- [Apply your changes](#apply-your-changes)
29-
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
3030
- [Recommended scaling strategies per component](#recommended-scaling-strategies-per-component)
31-
- [Ingester](#ingester)
32-
- [Querier](#querier)
33-
- [Router](#router)
34-
- [Compactor](#compactor)
35-
- [Garbage collector](#garbage-collector)
36-
- [Catalog](#catalog)
37-
- [Object store](#object-store)
3831

3932
## Scaling strategies
4033

@@ -59,6 +52,14 @@ throughput a system can manage, but also provides additional redundancy and fail
5952

6053
{{< html-diagram/scaling-strategy "horizontal" >}}
6154

55+
## Scale your cluster as a whole
56+
57+
Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
58+
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
59+
cluster depends on your underlying Kubernetes provider. You can also use
60+
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
61+
to automatically scale your cluster as needed.
62+
6263
## Scale components in your cluster
6364

6465
The following components of your InfluxDB cluster are scaled by modifying
@@ -69,11 +70,12 @@ properties in your `AppInstance` resource:
6970
- Compactor
7071
- Router
7172
- Garbage collector
73+
- Catalog service
7274

7375
{{% note %}}
74-
#### Scale your Catalog and Object store
76+
#### Scale your Catalog store and Object store
7577

76-
Your InfluxDB [Catalog](/influxdb3/clustered/reference/internals/storage-engine/#catalog)
78+
Your InfluxDB [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
7779
and [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
7880
are managed outside of your `AppInstance` resource. Scaling mechanisms for these
7981
components depend on the technology and underlying provider used for each.
@@ -451,22 +453,15 @@ helm upgrade \
451453
{{% /code-tab-content %}}
452454
{{< /code-tabs-wrapper >}}
453455

454-
## Scale your cluster as a whole
455-
456-
Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
457-
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
458-
cluster depends on your underlying Kubernetes provider. You can also use
459-
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
460-
to automatically scale your cluster as needed.
461-
462456
## Recommended scaling strategies per component
463457

464458
- [Router](#router)
465459
- [Ingester](#ingester)
466460
- [Querier](#querier)
467461
- [Compactor](#compactor)
468462
- [Garbage collector](#garbage-collector)
469-
- [Catalog](#catalog)
463+
- [Catalog store](#catalog-store)
464+
- [Catalog service](#catalog-service)
470465
- [Object store](#object-store)
471466

472467
### Router
@@ -563,16 +558,29 @@ efficiently as vertical scaling.
563558

564559
### Garbage collector
565560

566-
The Garbage collector can be scaled [vertically](#vertical-scaling). It is a
567-
light-weight process that typically doesn't require many system resources, but
568-
if you begin to see high resource consumption on the garbage collector, you can
569-
scale it vertically to address the added workload.
561+
The Garbage collector is not designed for distributed load and should _not_ be
562+
scaled horizontally. It is a lightweight process that typically doesn't require
563+
significant system resources. [Vertical scaling](#vertical-scaling) should only
564+
be considered if you observe consistently high CPU usage or if the container
565+
regularly runs out of memory.
566+
567+
### Catalog store
568+
569+
The Catalog store is a PostgreSQL-compatible database that persistently stores metadata.
570+
Scaling strategies depend on your chosen PostgreSQL implementation.
571+
All support [vertical scaling](#vertical-scaling), and most support
572+
[horizontal scaling](#horizontal-scaling) for redundancy and failover.
573+
574+
### Catalog service
570575

571-
### Catalog
576+
The Catalog service should maintain exactly
577+
3 replicas for optimal redundancy.
578+
Additional replicas are discouraged; favor vertical scaling instead if performance improvements are needed.
572579

573-
Scaling strategies available for the Catalog depend on the PostgreSQL-compatible
574-
database used to run the catalog. All support [vertical scaling](#vertical-scaling).
575-
Most support [horizontal scaling](#horizontal-scaling) for redundancy and failover.
580+
> [!Note]
581+
> The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
582+
> `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
583+
> is managed separately according to your PostgreSQL implementation.
576584

577585
### Object store
578586

content/influxdb3/clustered/install/_index.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,13 @@ Updating your InfluxDB cluster is as simple as re-applying your app-instance wit
6161
6262
The word safely here means being able to redeploy your cluster while still being able to use the tokens you’ve created, and being able to write/query to the database you’ve previously created.
6363
64-
All of the important state in InfluxDB 3 lives in the Catalog (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
64+
All of the important state in InfluxDB 3 lives in the Catalog store (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
6565
66-
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
66+
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog store and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
6767
6868
### Backing up your data
6969
70-
The Catalog and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
70+
The Catalog store and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog store implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
7171
7272
### Recovering your data
7373

content/influxdb3/clustered/install/secure-cluster/tls.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ following:
1717

1818
- Ingress to your cluster
1919
- Connection to your Object store
20-
- Connection to your Catalog (PostgreSQL-compatible) database
20+
- Connection to your Catalog store (PostgreSQL-compatible) database
2121

2222
{{% note %}}
2323
If using self-signed certs,
@@ -177,8 +177,8 @@ objectStore:
177177
Refer to your PostreSQL-compatible database provider's documentation for
178178
installing TLS certificates and ensuring secure connections.
179179

180-
If currently using an unsecure connection to your Catalog database, update your
181-
Catalog data source name (DSN) to **remove the `sslmode=disable` query parameter**:
180+
If currently using an unsecure connection to your Catalog store database, update your
181+
Catalog store data source name (DSN) to **remove the `sslmode=disable` query parameter**:
182182

183183
{{% code-callout "\?sslmode=disable" "magenta delete" %}}
184184
```txt

content/influxdb3/clustered/install/set-up-cluster/prerequisites.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ following sizing for {{% product-name %}} components:
100100
{{% tab-content %}}
101101
<!--------------------------------- BEGIN AWS --------------------------------->
102102

103-
- **Catalog (PostgreSQL-compatible database) (x1):**
103+
- **Catalog store (PostgreSQL-compatible database) (x1):**
104104
- _[See below](#postgresql-compatible-database-requirements)_
105105
- **Ingesters and Routers (x3):**
106106
- EC2 m6i.2xlarge (8 CPU, 32 GB RAM)
@@ -117,7 +117,7 @@ following sizing for {{% product-name %}} components:
117117
{{% tab-content %}}
118118
<!--------------------------------- BEGIN GCP --------------------------------->
119119

120-
- **Catalog (PostgreSQL-compatible database) (x1):**
120+
- **Catalog store (PostgreSQL-compatible database) (x1):**
121121
- _[See below](#postgresql-compatible-database-requirements)_
122122
- **Ingesters and Routers (x3):**
123123
- GCE c2-standard-8 (8 CPU, 32 GB RAM)
@@ -134,7 +134,7 @@ following sizing for {{% product-name %}} components:
134134
{{% tab-content %}}
135135
<!-------------------------------- BEGIN Azure -------------------------------->
136136

137-
- **Catalog (PostgreSQL-compatible database) (x1):**
137+
- **Catalog store (PostgreSQL-compatible database) (x1):**
138138
- _[See below](#postgresql-compatible-database-requirements)_
139139
- **Ingesters and Routers (x3):**
140140
- Standard_D8s_v3 (8 CPU, 32 GB RAM)
@@ -151,7 +151,7 @@ following sizing for {{% product-name %}} components:
151151
{{% tab-content %}}
152152
<!------------------------------- BEGIN ON-PREM ------------------------------->
153153

154-
- **Catalog (PostgreSQL-compatible database) (x1):**
154+
- **Catalog store (PostgreSQL-compatible database) (x1):**
155155
- CPU: 4-8 cores
156156
- RAM: 16-32 GB
157157
- **Ingesters and Routers (x3):**

content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,8 @@ including the following:
7878
- CPU and memory resources set on each type of InfluxDB pod
7979
- The number of pods in each InfluxDB StatefulSet and Deployment
8080
- The type of object store used and how it is hosted
81-
- How the Catalog (PostgreSQL-compatible database) is hosted
82-
- Indicate if either the Object store or the Catalog is shared by more than one InfluxDB
81+
- How the Catalog store (PostgreSQL-compatible database) is hosted
82+
- Indicate if either the Object store or the Catalog store is shared by more than one InfluxDB
8383
Clustered product
8484
- If so, describe the network-level topology of your setup
8585

0 commit comments

Comments
 (0)