Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(clustered): Closes https://github.com/influxdata/DAR/issues/472. … #5885

Open
wants to merge 1 commit into
base: jts/dar-484-add-steps-to-install-kubit-in-an-air-gapped-environment
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 20 additions & 22 deletions content/influxdb3/clustered/admin/backup-restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,29 +12,29 @@ weight: 105
influxdb3/clustered/tags: [backup, restore]
---

InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog that
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog store that
you can use to restore your cluster to a previous state. The snapshotting
functionality is optional and is disabled by default.
Enable snapshots to ensure you can recover
in case of emergency.

With InfluxDB Clustered snapshots enabled, each hour, InfluxDB uses the `pg_dump`
utility included with the InfluxDB Garbage Collector to export an SQL blob or
“snapshot” from the InfluxDB Catalog and store it in the object store.
The Catalog is a PostgreSQL-compatible relational database that stores metadata
utility included with the InfluxDB Garbage collector to export an SQL blob or
“snapshot” from the InfluxDB Catalog store to the Object store.
The Catalog store is a PostgreSQL-compatible relational database that stores metadata
for your time series data, such as schema data types, Parquet file locations, and more.

The Catalog snapshots act as recovery points for your InfluxDB cluster that
reference all Parquet files that existed in the object store at the time of the
snapshot. When a snapshot is restored to the Catalog, the Compactor
The Catalog store snapshots act as recovery points for your InfluxDB cluster that
reference all Parquet files that existed in the Object store at the time of the
snapshot. When a snapshot is restored to the Catalog store, the Compactor
“[soft deletes](#soft-delete)” any Parquet files not listed in the snapshot.

> [!Note]
> InfluxDB won't [hard delete](#hard-delete) Parquet files listed in _any_ hourly or daily snapshot.
>
> For example, if you have Parquet files A, B, C, and D, and you restore to a
> snapshot that includes B and C, but not A and D, then A and D are soft-deleted, but remain in object
> storage until they are no longer referenced in any Catalog snapshot.
> storage until they are no longer referenced in any Catalog store snapshot.
- [Soft delete](#soft-delete)
- [Hard delete](#hard-delete)
- [Recovery Point Objective (RPO)](#recovery-point-objective-rpo)
Expand Down Expand Up @@ -75,8 +75,8 @@ The InfluxDB Clustered snapshot strategy RPO allows for the following maximum da
## Recovery Time Objective (RTO)

RTO is the maximum amount of downtime allowed for an InfluxDB cluster after a failure.
RTO varies depending on the size of your Catalog database, network speeds
between the client machine and the Catalog database, cluster load, the status
RTO varies depending on the size of your Catalog store, network speeds
between the client machine and the Catalog store, cluster load, the status
of your underlying hosting provider, and other factors.

## Data written just before a snapshot may not be present after restoring
Expand All @@ -94,14 +94,14 @@ present after restoring to that snapshot.
### Automate object synchronization to an external S3-compatible bucket

Syncing objects to an external S3-compatible bucket ensures an up-to-date backup
in case your object store becomes unavailable. Recovery point snapshots only
back up the InfluxDB Catalog. If data referenced in a Catalog snapshot does not
exist in the object store, the recovery process does not restore the missing data.
in case your Object store becomes unavailable. Recovery point snapshots only
back up the InfluxDB Catalog store. If data referenced in a Catalog store snapshot does not
exist in the Object store, the recovery process does not restore the missing data.

### Enable short-term object versioning

If your object storage provider supports it, consider enabling short-term
object versioning on your object store--for example, 1-2 days to protect against errant writes or deleted objects.
object versioning on your Object store--for example, 1-2 days to protect against errant writes or deleted objects.
With object versioning enabled, as objects are updated, the object store
retains distinct versions of each update that can be used to “rollback” newly
written or updated Parquet files to previous versions.
Expand Down Expand Up @@ -140,7 +140,7 @@ spec:

#### INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES

Enable hourly Catalog snapshotting. The default is `'false'`. Set to `'true'`:
Enable hourly Catalog store snapshotting. The default is `'false'`. Set to `'true'`:

```yaml
INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
Expand Down Expand Up @@ -217,22 +217,20 @@ written on or around the beginning of the next hour.
## Restore to a recovery point

Use the following process to restore your InfluxDB cluster to a recovery point
using Catalog snapshots:
using Catalog store snapshots:

1. **Install prerequisites:**

- `kubectl` CLI for managing your Kubernetes deployment.
- `psql` CLI to interact with the PostgreSQL-compatible Catalog database with
the appropriate Data Source Name (DSN) and connection credentials.
- A client to interact with your InfluxDB cluster’s object store.
Supported clients depend on your object storage provider.
- `psql` CLI configured with your Data Source Name and credentials for interacting with the PostgreSQL-compatible Catalog store database.
- A client from your object storage provider for interacting with your InfluxDB cluster's Object store.

2. **Retrieve the recovery point snapshot from your object store.**

InfluxDB Clustered stores hourly and daily snapshots in the
`/catalog_backup_file_lists` path in object storage. Download the snapshot
that you would like to use as the recovery point. If your primary object
store is unavailable, download the snapshot from your replicated object store.
that you would like to use as the recovery point. If your primary Object
store is unavailable, download the snapshot from your replicated Object store.

> [!Important]
> When creating and storing a snapshot, the last artifact created is the
Expand Down
6 changes: 3 additions & 3 deletions content/influxdb3/clustered/admin/databases/delete.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ Once a database is deleted, data stored in that database cannot be recovered.

After a database is deleted, you cannot reuse the same name for a new database.

#### Never directly modify the Catalog
#### Never directly modify the Catalog store

Never directly modify the [PostgreSQL-compatible Catalog](/influxdb3/clustered/reference/internals/storage-engine/#catalog).
Doing so will result in an undefined state for various components and may lead to data loss and crashes.
Never directly modify the [PostgreSQL-compatible Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store).
Doing so will result in an undefined state for various components and may lead to data loss and crashes.
{{% /warn %}}
62 changes: 35 additions & 27 deletions content/influxdb3/clustered/admin/scale-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,12 @@ resources available to each component.
- [Scaling strategies](#scaling-strategies)
- [Vertical scaling](#vertical-scaling)
- [Horizontal scaling](#horizontal-scaling)
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
- [Scale components in your cluster](#scale-components-in-your-cluster)
- [Horizontally scale a component](#horizontally-scale-a-component)
- [Vertically scale a component](#vertically-scale-a-component)
- [Apply your changes](#apply-your-changes)
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
- [Recommended scaling strategies per component](#recommended-scaling-strategies-per-component)
- [Ingester](#ingester)
- [Querier](#querier)
- [Router](#router)
- [Compactor](#compactor)
- [Garbage collector](#garbage-collector)
- [Catalog](#catalog)
- [Object store](#object-store)

## Scaling strategies

Expand All @@ -59,6 +52,14 @@ throughput a system can manage, but also provides additional redundancy and fail

{{< html-diagram/scaling-strategy "horizontal" >}}

## Scale your cluster as a whole

Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
cluster depends on your underlying Kubernetes provider. You can also use
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
to automatically scale your cluster as needed.

## Scale components in your cluster

The following components of your InfluxDB cluster are scaled by modifying
Expand All @@ -69,11 +70,12 @@ properties in your `AppInstance` resource:
- Compactor
- Router
- Garbage collector
- Catalog service

{{% note %}}
#### Scale your Catalog and Object store
#### Scale your Catalog store and Object store

Your InfluxDB [Catalog](/influxdb3/clustered/reference/internals/storage-engine/#catalog)
Your InfluxDB [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
and [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
are managed outside of your `AppInstance` resource. Scaling mechanisms for these
components depend on the technology and underlying provider used for each.
Expand Down Expand Up @@ -451,22 +453,15 @@ helm upgrade \
{{% /code-tab-content %}}
{{< /code-tabs-wrapper >}}

## Scale your cluster as a whole

Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
cluster depends on your underlying Kubernetes provider. You can also use
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
to automatically scale your cluster as needed.

## Recommended scaling strategies per component

- [Router](#router)
- [Ingester](#ingester)
- [Querier](#querier)
- [Compactor](#compactor)
- [Garbage collector](#garbage-collector)
- [Catalog](#catalog)
- [Catalog store](#catalog-store)
- [Catalog service](#catalog-service)
- [Object store](#object-store)

### Router
Expand Down Expand Up @@ -563,16 +558,29 @@ efficiently as vertical scaling.

### Garbage collector

The Garbage collector can be scaled [vertically](#vertical-scaling). It is a
light-weight process that typically doesn't require many system resources, but
if you begin to see high resource consumption on the garbage collector, you can
scale it vertically to address the added workload.
The Garbage collector is not designed for distributed load and should _not_ be
scaled horizontally. It is a lightweight process that typically doesn't require
significant system resources. [Vertical scaling](#vertical-scaling) should only
be considered if you observe consistently high CPU usage or if the container
regularly runs out of memory.

### Catalog store

The Catalog store is a PostgreSQL-compatible database that persistently stores metadata.
Scaling strategies depend on your chosen PostgreSQL implementation.
All support [vertical scaling](#vertical-scaling), and most support
[horizontal scaling](#horizontal-scaling) for redundancy and failover.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting that "an underprovisioned catalog store can result in write outages." @marcdavoli I think it would be good to give a concrete example of what we do in Cloud Dedicated. Can you suggest something here.


### Catalog service

### Catalog
The Catalog service should maintain exactly
3 replicas for optimal redundancy.
Additional replicas are discouraged; favor vertical scaling instead if performance improvements are needed.

Scaling strategies available for the Catalog depend on the PostgreSQL-compatible
database used to run the catalog. All support [vertical scaling](#vertical-scaling).
Most support [horizontal scaling](#horizontal-scaling) for redundancy and failover.
> [!Note]
> The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
> `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
> is managed separately according to your PostgreSQL implementation.

### Object store

Expand Down
6 changes: 3 additions & 3 deletions content/influxdb3/clustered/install/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,13 @@ Updating your InfluxDB cluster is as simple as re-applying your app-instance wit

The word safely here means being able to redeploy your cluster while still being able to use the tokens you’ve created, and being able to write/query to the database you’ve previously created.

All of the important state in InfluxDB 3 lives in the Catalog (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
All of the important state in InfluxDB 3 lives in the Catalog store (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.

If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog store and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.

### Backing up your data

The Catalog and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
The Catalog store and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog store implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.

### Recovering your data

Expand Down
6 changes: 3 additions & 3 deletions content/influxdb3/clustered/install/secure-cluster/tls.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ following:

- Ingress to your cluster
- Connection to your Object store
- Connection to your Catalog (PostgreSQL-compatible) database
- Connection to your Catalog store (PostgreSQL-compatible) database

{{% note %}}
If using self-signed certs,
Expand Down Expand Up @@ -177,8 +177,8 @@ objectStore:
Refer to your PostreSQL-compatible database provider's documentation for
installing TLS certificates and ensuring secure connections.

If currently using an unsecure connection to your Catalog database, update your
Catalog data source name (DSN) to **remove the `sslmode=disable` query parameter**:
If currently using an unsecure connection to your Catalog store database, update your
Catalog store data source name (DSN) to **remove the `sslmode=disable` query parameter**:

{{% code-callout "\?sslmode=disable" "magenta delete" %}}
```txt
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!--------------------------------- BEGIN AWS --------------------------------->

- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- _[See below](#postgresql-compatible-database-requirements)_
- **Ingesters and Routers (x3):**
- EC2 m6i.2xlarge (8 CPU, 32 GB RAM)
Expand All @@ -117,7 +117,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!--------------------------------- BEGIN GCP --------------------------------->

- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- _[See below](#postgresql-compatible-database-requirements)_
- **Ingesters and Routers (x3):**
- GCE c2-standard-8 (8 CPU, 32 GB RAM)
Expand All @@ -134,7 +134,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!-------------------------------- BEGIN Azure -------------------------------->

- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- _[See below](#postgresql-compatible-database-requirements)_
- **Ingesters and Routers (x3):**
- Standard_D8s_v3 (8 CPU, 32 GB RAM)
Expand All @@ -151,7 +151,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!------------------------------- BEGIN ON-PREM ------------------------------->

- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- CPU: 4-8 cores
- RAM: 16-32 GB
- **Ingesters and Routers (x3):**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ including the following:
- CPU and memory resources set on each type of InfluxDB pod
- The number of pods in each InfluxDB StatefulSet and Deployment
- The type of object store used and how it is hosted
- How the Catalog (PostgreSQL-compatible database) is hosted
- Indicate if either the Object store or the Catalog is shared by more than one InfluxDB
- How the Catalog store (PostgreSQL-compatible database) is hosted
- Indicate if either the Object store or the Catalog store is shared by more than one InfluxDB
Clustered product
- If so, describe the network-level topology of your setup

Expand Down
Loading