Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
7a87769
section structure
wojcik-dorota Oct 29, 2024
ffa155d
section structure
wojcik-dorota Oct 29, 2024
d68f637
concept part1
wojcik-dorota Nov 12, 2024
3e3f1ab
crdr setup diagram
wojcik-dorota Nov 19, 2024
71954f7
failover diagram
wojcik-dorota Nov 20, 2024
0fdad39
revert opertation
wojcik-dorota Nov 20, 2024
e4edc4e
revert diagram
wojcik-dorota Nov 20, 2024
0140a7b
diagrams highlighting
wojcik-dorota Nov 21, 2024
8ec8b56
diagrams look and feel
wojcik-dorota Nov 22, 2024
605f095
enable crdr
wojcik-dorota Nov 25, 2024
7584859
crdr failover
wojcik-dorota Nov 25, 2024
54e8344
revert
wojcik-dorota Nov 25, 2024
e323b57
cli instructions for crdr ops
wojcik-dorota Dec 2, 2024
f502399
api calls for CRDR management
wojcik-dorota Dec 11, 2024
25f2463
update: separate swizzled components (#604)
ArthurFlag Dec 17, 2024
2ed0a0a
enable crdr
wojcik-dorota Nov 25, 2024
d693f3d
fixing terminology re failover vs switchover
wojcik-dorota Jan 13, 2025
b4b4e14
fix
wojcik-dorota Jan 13, 2025
2e4b6d5
fix
wojcik-dorota Jan 13, 2025
c5a11f0
set up cdrd via terraform
wojcik-dorota Jan 29, 2025
7b6501b
how to detect region outage
wojcik-dorota Jan 31, 2025
9d9d850
switchback
wojcik-dorota Feb 6, 2025
a97a49a
switchover diagram
wojcik-dorota Feb 7, 2025
a2a4637
switchover &switchback
wojcik-dorota Feb 10, 2025
115610f
toc
wojcik-dorota Feb 17, 2025
f5fba14
switchover and switchback via console
wojcik-dorota Feb 17, 2025
021e4ce
related pages
wojcik-dorota Feb 17, 2025
f186d84
fix
wojcik-dorota Oct 1, 2025
300bfb7
fix
wojcik-dorota Oct 1, 2025
5fc7971
fix
wojcik-dorota Oct 1, 2025
88ac121
switchover api and cli
wojcik-dorota Oct 2, 2025
815f447
fix
wojcik-dorota Oct 2, 2025
f1f4914
TF flows
wojcik-dorota Oct 2, 2025
bc66b8c
fix
wojcik-dorota Oct 2, 2025
b142cf7
fix
wojcik-dorota Oct 2, 2025
8258af8
fix
wojcik-dorota Oct 2, 2025
7355714
Apply suggestions from code review
wojcik-dorota Oct 13, 2025
1b2c4cb
Update crdr-switchover.md
wojcik-dorota Oct 13, 2025
7437436
Update crdr-revert-to-primary.md
wojcik-dorota Oct 13, 2025
a489312
fix typos
wojcik-dorota Oct 13, 2025
4ec69ca
fix typos
wojcik-dorota Oct 13, 2025
324dc58
fix typos
wojcik-dorota Oct 13, 2025
9f55c37
fix typos
wojcik-dorota Oct 13, 2025
f88bc77
fix typos
wojcik-dorota Oct 13, 2025
d5d53c8
fix typos
wojcik-dorota Oct 13, 2025
204c01e
removing automatic failover for LA
wojcik-dorota Oct 16, 2025
5ccf969
fix
wojcik-dorota Oct 16, 2025
871eb73
fix
wojcik-dorota Oct 16, 2025
1e6bee4
fix
wojcik-dorota Oct 16, 2025
ee7ef5e
gui fixes to failover and failback
wojcik-dorota Oct 16, 2025
78c58d3
removed console flows for switchover and switchback
wojcik-dorota Oct 16, 2025
9ab0657
feedback
wojcik-dorota Oct 17, 2025
ae023b7
dns name
wojcik-dorota Oct 17, 2025
7d3bfeb
restrictions
wojcik-dorota Oct 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions .github/vale/styles/config/vocabularies/Aiven/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,19 @@ European Union
eval
event_type
exactly-once
expirations?
failovers?
filesets?
expirations
failback
Failback
failbacks
Failbacks
failover
Failover
failovers
Failovers
fileset
filesets
Fileset
Filesets
Flink
Followerfetching
Forecast
Expand Down
2 changes: 1 addition & 1 deletion docs/platform/concepts/rename-services.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ title: Rename a service
You cannot rename a service after creation. Instead, you can create a fork
with the new name and delete the original service.

1. Stop writes on the the service.
1. Stop writes on the service.
1. [Fork the service](/docs/platform/concepts/service-forking).
1. Point your clients to the new service and add any integrations or SSO configuration
that weren't copied.
Expand Down
2 changes: 1 addition & 1 deletion docs/platform/howto/byoc/enable-byoc.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Before enabling BYOC, check
request.
1. Using the scheduling assistant, select a date and time when to talk to our sales team
to share your requirements and make sure BYOC suits your needs. Confirm the selected
time, make sure you add the call to your calendar, and close the the scheduling
time, make sure you add the call to your calendar, and close the scheduling
assistant.
1. Join the scheduled call with our sales team to follow up with us
on enabling BYOC in your environment.
Expand Down
2 changes: 1 addition & 1 deletion docs/platform/howto/google-cloud-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ You have:

1. Click **create function**

- **Environment**: Your choice of environment. You can use the the default value (2nd
- **Environment**: Your choice of environment. You can use the default value (2nd
gen).
- **Function name**: the name of your choice.
- **Region**: The region of the **serverless VPC access connector**.
Expand Down
1 change: 1 addition & 0 deletions docs/platform/howto/prepare-for-high-load.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Prepare services for high load
sidebar_label: Prepare for high load
---

import RelatedPages from "@site/src/components/RelatedPages";
Expand Down
2 changes: 1 addition & 1 deletion docs/products/cassandra/howto/connect-cqlsh-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export SSL_CERTFILE=<PATH>

:::note
Alternatively, you can provide the path to the CA certificate file in
the `[ssl]` section by setting the the `certfile` parameter in
the `[ssl]` section by setting the `certfile` parameter in
`~/.cassandra/cqlshrc`.
:::

Expand Down
2 changes: 1 addition & 1 deletion docs/products/kafka/howto/configure-with-kafka-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Manage configurations with Apache Kafka® CLI tools
---

Aiven for Apache Kafka® services are fully manageable and customizable via the the [Aiven CLI](/docs/tools/cli).
Aiven for Apache Kafka® services are fully manageable and customizable via the [Aiven CLI](/docs/tools/cli).

To guarantee the service stability, direct Apache
ZooKeeper™ access isn't available, but our tooling provides you all the
Expand Down
15 changes: 15 additions & 0 deletions docs/products/postgresql/crdr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: Cross-region disaster recovery in Aiven for PostgreSQL®
---

import DocCardList from '@theme/DocCardList';

<DocCardList />

## Related pages

- [Backups in Aiven for PostgreSQL®](/docs/products/postgresql/concepts/pg-backups)
- [Read-only replicas in Aiven for PostgreSQL®](/docs/products/postgresql/howto/create-read-replica)
- [High availability in Aiven for PostgreSQL®](/docs/products/postgresql/concepts/high-availability)
- [Upgrade and failover procedures in in Aiven for PostgreSQL®](/docs/products/postgresql/concepts/upgrade-failover)
- [Backup to another region](/docs/platform/concepts/backup-to-another-region)
230 changes: 230 additions & 0 deletions docs/products/postgresql/crdr/crdr-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
---
title: Cross-region disaster recovery in Aiven for PostgreSQL®
sidebar_label: CRDR overview
limited: true
keywords: [recovery, primary, outage, failure, failover]
---

import ConsoleLabel from "@site/src/components/ConsoleIcons";
import RelatedPages from "@site/src/components/RelatedPages";
import readyForCrdr from "@site/static/images/content/figma/ready-for-crdr.png";
import crdrSetup from "@site/static/images/content/figma/crdr-setup.png";
import crdrFailover from "@site/static/images/content/figma/crdr-failover.png";
import crdrSwitchover from "@site/static/images/content/figma/crdr-switchover.png";
import crdrRevert from "@site/static/images/content/figma/crdr-revert.png";
import crdrSwitchback from "@site/static/images/content/figma/crdr-switchback.png";

The cross-region disaster recovery (CRDR) feature ensures your business continuity by recovering your workloads to a remote region in the event of a region-wide
failure.

## Region-wide outage

CRDR allows you to cope with the primary region failure by initiating a recovery transition
to another region. To identify a region outage, look into the region status:

- Check your monitoring and alerts, and watch the following metrics:
- Instances, nodes, services failures
- Connectivity loss, latency spikes, packet drops
- High error rates, timeouts, 5xx server errors
- Check your cloud provider's status page:
- [AWS](https://health.aws.amazon.com)
- [Google Cloud](https://status.cloud.google.com)
- [Azure](https://status.azure.com)
- Test connectivity and DNS resolution for your instances or services.

## CRDR overview

The CRDR setup is a pair of integrated multi-node services, sharing credentials and a
DNS name but located in different regions. CRDR peer services can be hosted on 1-3 nodes.

- **Primary service** hosted in the primary region is your original service you use on
regular basis. It hands over to the recovery service when you initiate
[a failover or a switchover](/docs/products/postgresql/crdr/crdr-overview#recovery-transition).
When you initiate
[a failback or a switchback](/docs/products/postgresql/crdr/crdr-overview#recovery-reversion),
the primary service takes back control from the recovery service as soon as the
infrastructure is up and running again.
- **Recovery service** hosted in the recovery region is the service you create for
disaster recovery purposes. It takes over from the primary service when you initiate
[a failover or a switchover](/docs/products/postgresql/crdr/crdr-overview#recovery-transition).
When you initiate
[a failback or a switchback](/docs/products/postgresql/crdr/crdr-overview#recovery-reversion),
the recovery service hands over to the primary service as soon as the infrastructure is
up and running again.

The CRDR cycle is a sequence of actions involving CRDR peer services aimed at enabling and
executing CRDR as well as resuming the original service operation.

Throughout the CRDR cycle, CRDR peer services or service nodes go into the following states:

- **Active**: A CRDR peer service is *active* when it runs on a node that is replicating data to
CRDR standby nodes.
- Primary service is active during normal operations, when a region is up and running.
- Recovery service is active after taking over from primary service in the event of a region outage.

- **Passive**: A CRDR peer service is *passive* when it runs on CRDR standby nodes only. Either CRDR
peer service can be passive depending on a phase of the CRDR cycle.

- **Failed**: A CRDR peer service is *failed* when it's defunct or unreachable after failing over
in the event of a region outage. Only a primary service can be failed.

## Limitations

- **Service plan requirements**: To set up CRDR, your primary service must use at least a
Startup plan. Hobbyist and Free plans are not supported.

:::tip[Upgrading your plan]
If your Aiven for PostgreSQL service uses a Hobbyist plan or a Free plan,
[upgrade your free plan](/docs/platform/concepts/service-pricing#free-plans) or
[change your Hobbyist plan](/docs/platform/howto/scale-services) to at least a Startup
plan.
:::

- **Console restrictions**: When creating a recovery service through
the [Aiven Console](https://console.aiven.io/), you must use the same service plan and
cloud provider as your primary service.

:::tip[Alternative setup methods]
For different service plans or cloud providers, create your recovery service using the
[Aiven CLI](/docs/tools/cli), the [Aiven API](/docs/tools/api), or the
[Aiven Provider for Terraform](https://registry.terraform.io/providers/aiven/aiven/latest/docs).
:::

## How it works

The CRDR feature is eligible for all startup, business, and premium service plans.

<img src={readyForCrdr} className="centered" alt="Ready for CRDR" width="100%" />

### CRDR setup

You [enable CRDR by creating a recovery service](/docs/products/postgresql/crdr/enable-crdr).
The CRDR setup completes as soon as the recovery service is created and in sync with the
primary service. At that point, the primary service is the **Active** service receiving
incoming traffic and replicating to the recovery service, and the recovery service is the
**Passive** service replicating from the primary service.

<img src={crdrSetup} className="centered" alt="CRDR setup" width="100%" />

### Recovery transition

CRDR supports two types of the recovery transition:

- [Failover](/docs/products/postgresql/crdr/crdr-overview#failover-to-the-recovery-region)
- **Triggered by you** typically in the event of a region-wide outage
- **Destroys the primary service** and requires the primary service recreation to fail back.
- [Switchover](/docs/products/postgresql/crdr/crdr-overview#switchover-to-the-recovery-region)
- **Triggered by you** for any purposes other than a region-wide outage
- Leaves the **primary service intact** with no need for recreating it to switch back.

#### Failover to the recovery region

You typically trigger a
[failover to the recovery region](/docs/products/postgresql/crdr/failover/crdr-failover-to-recovery)
in the event of a region-wide outage. This destroys the primary service, which becomes
**Failed**, and promotes the recovery service to **Active**. To fail back to
the primary service, it needs to be recreated first.

<img src={crdrFailover} className="centered" alt="CRDR failover" width="100%" />

#### Switchover to the recovery region

You trigger a
[switchover to the recovery service](/docs/products/postgresql/crdr/switchover/crdr-switchover)
for testing, simulating a disaster scenario, or verifying the disaster resilience of your
infrastructure. This demotes the primary service to **Passive** and promotes the recovery
service to **Active**. To switch back to the primary service, no service recreation is
needed.

<img src={crdrSwitchover} className="centered" alt="CRDR switchover" width="100%" />

### Recovery reversion

You trigger a recovery reversion to shift your workload back to the primary region and
restore the CRDR setup to its original configuration.

There are two types of the recovery reversion:

- [Failback](/docs/products/postgresql/crdr/crdr-overview#failback-to-the-primary-region)
- Reverts a
[failover](/docs/products/postgresql/crdr/crdr-overview#failover-to-the-recovery-region).
- Recreates the primary service.
- [Switchback](/docs/products/postgresql/crdr/crdr-overview#switchback-to-the-primary-region)
- Reverts a
[switchover](/docs/products/postgresql/crdr/crdr-overview#switchover-to-the-recovery-region).
- No need to recreate the primary service.

#### Failback to the primary region

The failback process consists of two steps you initiate at your convenience:

1. [Primary service recreation](/docs/products/postgresql/crdr/failover/crdr-revert-to-primary)

You initiate this step to restore primary service nodes from the local backups and to
synchronize (replicate) the most recent data from the active service (recovery service).
When completed, the primary service is restored and in near real-time sync with the recovery service.

1. [Primary service takeover](/docs/products/postgresql/crdr/failover/crdr-revert-to-primary)

You initiate a takeover as soon as the primary service is recreated. This switches the direction of
the replication to effectively route the traffic back to the primary region. When
completed, both the primary service and the recovery service are up and running again: the primary service as an active
service, and the recovery service as a passive service.

<img src={crdrRevert} className="centered" alt="CRDR revert" width="100%" />

#### Switchback to the primary region

You initiate a switchback at your convenience to switch the direction of the
replication and route the traffic back to the primary region. When completed, both the primary service
and the recovery service are up and running again: the primary service as an active service, and the recovery service as a
passive service.

<img src={crdrSwitchback} className="centered" alt="CRDR switchback" width="100%" />

## DNS name and service URI

### Active service DNS name

CRDR allows you to access your active service always using the same **Service URI**,
which doesn't change in the event of a failover to the recovery region.

:::note
**Service URI** is a locator that is shared between the primary service and the recovery service. It always points
to the replicating node of the active service. This node is the only read-write node
in both CRDR regions.
:::

The **Service URI** of an active service can remain unchanged in the event of a region outage
because the DNS record of this **Service URI** is updated to point to the active service.
This allows your applications to work uninterrupted and adapt to the change automatically
without updating its code or data.

### Standby nodes DNS names

Regardless of the CRDR cycle phase, you can always connect and access separately
each standby node in the CRDR peer services. This can help you compensate for potential
network delays by using the service geographically closer to your applications.

Standby nodes in the CRDR service pair can have two different URIs, depending on the CRDR
service (region) they belong to:

- For the **primary service standby URI**, the DNS record always points to the standby nodes
in the primary region.
- For the **recovery service standby URI**, the DNS record always points to the standby nodes
in the recovery region.

Both the primary service standby URI and the recovery service standby URI are dedicated, not shared, and read-only.

## Backups in the recovery region

After a failover to the recovery region in the event of a primary region outage, service
backups start to be taken in the recovery region. You can use this backup history for
operations and data resiliency purposes.

<RelatedPages/>

- [Aiven for PostgreSQL high availability](/docs/products/postgresql/concepts/high-availability)
- [Aiven for PostgreSQL backups](/docs/products/postgresql/concepts/pg-backups)
- [Aiven for PostgreSQL read-only replica](/docs/products/postgresql/howto/create-read-replica)
- [Backup to another region](/docs/platform/concepts/backup-to-another-region)
Loading