Skip to content

3804 docs rfc create a doc showing how to integrate debezium with timescale cloud #4024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
cc20f2e
draft
atovpeko Feb 6, 2025
d95104d
Merge branch 'latest' of github.com:timescale/docs into 3804-docs-rfc…
atovpeko Feb 17, 2025
e2d4457
update
atovpeko Feb 18, 2025
9a1e9c3
chore: update to latest.
billy-the-fish Apr 15, 2025
bc1c392
Merge branch 'latest' into 3804-docs-rfc-create-a-doc-showing-how-to-…
billy-the-fish Apr 15, 2025
6ec3dc8
chore: more updates.
billy-the-fish Apr 16, 2025
a6266bf
chore: more updates.
billy-the-fish Apr 17, 2025
c4d4c36
Merge branch 'latest' into 3804-docs-rfc-create-a-doc-showing-how-to-…
billy-the-fish Apr 22, 2025
ede3998
chore: more updates.
billy-the-fish Apr 22, 2025
0df0e6f
Merge branch 'latest' into 3804-docs-rfc-create-a-doc-showing-how-to-…
billy-the-fish Apr 22, 2025
6571bcc
chore: more updates.
billy-the-fish Apr 22, 2025
480ad76
chore: more updates.
billy-the-fish Apr 22, 2025
f26c546
chore: fix variable and make whole doc self-hosted only.
billy-the-fish Apr 23, 2025
d7cfc66
chore: small fix.
billy-the-fish Apr 24, 2025
edddae0
Merge branch 'latest' into 3804-docs-rfc-create-a-doc-showing-how-to-…
billy-the-fish Apr 25, 2025
b77203c
Merge branch 'latest' into 3804-docs-rfc-create-a-doc-showing-how-to-…
billy-the-fish Apr 29, 2025
e2b5e55
chore: update classpath
billy-the-fish Apr 29, 2025
ec7452c
chore: update for docker.
billy-the-fish Apr 30, 2025
c85081f
chore: update for docker.
billy-the-fish Apr 30, 2025
c82dfd1
Apply suggestions from code review
billy-the-fish Apr 30, 2025
35d5a6d
Merge branch 'latest' into 3804-docs-rfc-create-a-doc-showing-how-to-…
billy-the-fish Apr 30, 2025
67b1721
chore: update for docker.
billy-the-fish Apr 30, 2025
b46ee56
draft
atovpeko Feb 6, 2025
25b3f8d
chore: cleanup.
billy-the-fish Apr 30, 2025
11f5cf6
chore: the merge from another dimension.
billy-the-fish Apr 30, 2025
eb63aa8
Update integrations/debezium.md
billy-the-fish Apr 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions _partials/_integration-apache-kafka-install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
1. **Extract the Kafka binaries to a local folder**

```bash
curl https://dlcdn.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz | tar -xzf -
cd kafka_2.13-3.9.0
```
From now on, the folder where you extracted the Kafka binaries is called `<KAFKA_HOME>`.

1. **Configure and run Apache Kafka**

```bash
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/kraft/reconfig-server.properties
./bin/kafka-server-start.sh config/kraft/reconfig-server.properties
```
Use the `-daemon` flag to run this process in the background.

1. **Create Kafka topics**

In another Terminal window, navigate to <KAFKA_HOME>, then call `kafka-topics.sh` and create the following topics:
- `accounts`: publishes JSON messages that are consumed by the timescale-sink connector and inserted into your $SERVICE_LONG.
- `deadletter`: stores messages that cause errors and that Kafka Connect workers cannot process.

```bash
./bin/kafka-topics.sh \
--create \
--topic accounts \
--bootstrap-server localhost:9092 \
--partitions 10

./bin/kafka-topics.sh \
--create \
--topic deadletter \
--bootstrap-server localhost:9092 \
--partitions 10
```

1. **Test that your topics are working correctly**
1. Run `kafka-console-producer` to send messages to the `accounts` topic:
```bash
bin/kafka-console-producer.sh --topic accounts --bootstrap-server localhost:9092
```
1. Send some events. For example, type the following:
```bash
>Timescale Cloud
>How Cool
```
1. In another Terminal window, navigate to <KAFKA_HOME>, then run `kafka-console-consumer` to consume the events you just sent:
```bash
bin/kafka-console-consumer.sh --topic accounts --from-beginning --bootstrap-server localhost:9092
```
You see
```bash
Timescale Cloud
How Cool
```


29 changes: 29 additions & 0 deletions _partials/_integration-debezium-cloud-config-service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@


1. **Connect to your $SERVICE_LONG**

For $CLOUD_LONG, open an [SQL editor][run-queries] in [$CONSOLE][open-console]. For self-hosted, use [`psql`][psql].

1. **Enable logical replication for your $SERVICE_LONG**

1. Run the following command to enable logical replication:

```sql
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();
```

1. Restart your $SERVICE_SHORT.

1. **Create a table**

Create a table to test the integration. For example:

```sql
CREATE TABLE sensor_data (
id SERIAL PRIMARY KEY,
device_id TEXT NOT NULL,
temperature FLOAT NOT NULL,
recorded_at TIMESTAMPTZ DEFAULT now()
);
```
139 changes: 139 additions & 0 deletions _partials/_integration-debezium-docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@

1. **Run Zookeeper in Docker**

In another Terminal window, run the following command:
```bash
docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper:3.0
```
Check the output log to see that zookeeper is running.

1. **Run Kafka in Docker**

In another Terminal window, run the following command:
```bash
docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper quay.io/debezium/kafka:3.0
```
Check the output log to see that Kafka is running.


1. **Run Kafka Connect in Docker**

In another Terminal window, run the following command:
```bash
docker run -it --rm --name connect \
-p 8083:8083 \
-e GROUP_ID=1 \
-e CONFIG_STORAGE_TOPIC=accounts \
-e OFFSET_STORAGE_TOPIC=offsets \
-e STATUS_STORAGE_TOPIC=storage \
--link kafka:kafka \
--link timescaledb:timescaledb \
quay.io/debezium/connect:3.0
```
Check the output log to see that Kafka Connect is running.


1. **Register the Debezium PostgreSQL source connector**

Update the `<properties>` for the `<debezium-user>` you created in your $SELF_LONG instance in the following command.
Then run the command in another Terminal window:
```bash
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "timescaledb-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "timescaledb",
"database.port": "5432",
"database.user": "<debezium-user>",
"database.password": "<debezium-password>",
"database.dbname" : "postgres",
"topic.prefix": "accounts",
"plugin.name": "pgoutput",
"schema.include.list": "public,_timescaledb_internal",
"transforms": "timescaledb",
"transforms.timescaledb.type": "io.debezium.connector.postgresql.transforms.timescaledb.TimescaleDb",
"transforms.timescaledb.database.hostname": "timescaledb",
"transforms.timescaledb.database.port": "5432",
"transforms.timescaledb.database.user": "<debezium-user>",
"transforms.timescaledb.database.password": "<debezium-password>",
"transforms.timescaledb.database.dbname": "postgres"
}
}'
```

1. **Verify `timescaledb-source-connector` is included in the connector list**

1. Check the tasks associated with `timescaledb-connector`:
```bash
curl -i -X GET -H "Accept:application/json" localhost:8083/connectors/timescaledb-connector
```
You see something like:
```bash
{"name":"timescaledb-connector","config":
{ "connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"transforms.timescaledb.database.hostname":"timescaledb",
"transforms.timescaledb.database.password":"debeziumpassword","database.user":"debezium",
"database.dbname":"postgres","transforms.timescaledb.database.dbname":"postgres",
"transforms.timescaledb.database.user":"debezium",
"transforms.timescaledb.type":"io.debezium.connector.postgresql.transforms.timescaledb.TimescaleDb",
"transforms.timescaledb.database.port":"5432","transforms":"timescaledb",
"schema.include.list":"public,_timescaledb_internal","database.port":"5432","plugin.name":"pgoutput",
"topic.prefix":"accounts","database.hostname":"timescaledb","database.password":"debeziumpassword",
"name":"timescaledb-connector"},"tasks":[{"connector":"timescaledb-connector","task":0}],"type":"source"}
```

1. **Verify `timescaledb-connector` is running**

1. Open the Terminal window running Kafka Connect. When the connector is active, you see something like the following:

```bash
2025-04-30 10:40:15,168 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal._hyper_1_1_chunk' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,168 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_job_stat' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,175 INFO Postgres|accounts|streaming SignalProcessor started. Scheduling it every 5000ms [io.debezium.pipeline.signal.SignalProcessor]
2025-04-30 10:40:15,175 INFO Postgres|accounts|streaming Creating thread debezium-postgresconnector-accounts-SignalProcessor [io.debezium.util.Threads]
2025-04-30 10:40:15,175 INFO Postgres|accounts|streaming Starting streaming [io.debezium.pipeline.ChangeEventSourceCoordinator]
2025-04-30 10:40:15,176 INFO Postgres|accounts|streaming Retrieved latest position from stored offset 'LSN{0/1FCE570}' [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
2025-04-30 10:40:15,176 INFO Postgres|accounts|streaming Looking for WAL restart position for last commit LSN 'null' and last change LSN 'LSN{0/1FCE570}' [io.debezium.connector.postgresql.connection.WalPositionLocator]
2025-04-30 10:40:15,176 INFO Postgres|accounts|streaming Initializing PgOutput logical decoder publication [io.debezium.connector.postgresql.connection.PostgresReplicationConnection]
2025-04-30 10:40:15,189 INFO Postgres|accounts|streaming Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=LSN{0/1FCCFF0}, catalogXmin=884] [io.debezium.connector.postgresql.connection.PostgresConnection]
2025-04-30 10:40:15,189 INFO Postgres|accounts|streaming Connection gracefully closed [io.debezium.jdbc.JdbcConnection]
2025-04-30 10:40:15,204 INFO Postgres|accounts|streaming Requested thread factory for component PostgresConnector, id = accounts named = keep-alive [io.debezium.util.Threads]
2025-04-30 10:40:15,204 INFO Postgres|accounts|streaming Creating thread debezium-postgresconnector-accounts-keep-alive [io.debezium.util.Threads]
2025-04-30 10:40:15,216 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_policy_chunk_stats' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,216 INFO Postgres|accounts|streaming REPLICA IDENTITY for 'public.accounts' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,217 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_job_stat_history' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,217 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal._hyper_1_1_chunk' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,217 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_job_stat' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
2025-04-30 10:40:15,219 INFO Postgres|accounts|streaming Processing messages [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
```

1. Watch the events in the accounts topic on your $SELF_LONG instance.

In another Terminal instance, run the following command:

```bash
docker run -it --rm --name watcher --link zookeeper:zookeeper --link kafka:kafka quay.io/debezium/kafka:3.0 watch-topic -a -k accounts
```

You see the topics being streamed. For example:

```bash
status-task-timescaledb-connector-0 {"state":"RUNNING","trace":null,"worker_id":"172.17.0.5:8083","generation":31}
status-topic-timescaledb.public.accounts:connector-timescaledb-connector {"topic":{"name":"timescaledb.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009337985}}
status-topic-accounts._timescaledb_internal.bgw_job_stat:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338118}}
status-topic-accounts._timescaledb_internal.bgw_job_stat:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338120}}
status-topic-accounts._timescaledb_internal.bgw_job_stat_history:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat_history","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338243}}
status-topic-accounts._timescaledb_internal.bgw_job_stat_history:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat_history","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338245}}
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338250}}
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
["timescaledb-connector",{"server":"accounts"}] {"last_snapshot_record":true,"lsn":33351024,"txId":893,"ts_usec":1746009337290783,"snapshot":"INITIAL","snapshot_completed":true}
status-connector-timescaledb-connector {"state":"UNASSIGNED","trace":null,"worker_id":"172.17.0.5:8083","generation":31}
status-task-timescaledb-connector-0 {"state":"UNASSIGNED","trace":null,"worker_id":"172.17.0.5:8083","generation":31}
status-connector-timescaledb-connector {"state":"RUNNING","trace":null,"worker_id":"172.17.0.5:8083","generation":33}
status-task-timescaledb-connector-0 {"state":"RUNNING","trace":null,"worker_id":"172.17.0.5:8083","generation":33}
```
76 changes: 76 additions & 0 deletions _partials/_integration-debezium-self-hosted-config-database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@

1. **Configure your self-hosted $PG deployment**

1. Open `postgresql.conf`.

The $PG configuration files are usually located in:

- Docker: `/home/postgres/pgdata/data/`
- Linux: `/etc/postgresql/<version>/main/` or `/var/lib/pgsql/<version>/data/`
- MacOS: `/opt/homebrew/var/postgresql@<version>/`
- Windows: `C:\Program Files\PostgreSQL\<version>\data\`

1. Enable logical replication.

Modify the following settings in `postgresql.conf`:

```ini
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
```

1. Open `pg_hba.conf` and enable host replication.

To allow replication connections, add the following:

```
local replication debezium trust
```
This permission is for the `debezium` $PG user running on a local or Docker deployment. For more about replication
permissions, see [Configuring PostgreSQL to allow replication with the Debezium connector host][debezium-replication-permissions].

1. Restart $PG.


1. **Connect to your $SELF_LONG instance**

Use [`psql`][psql-connect].

1. **Create a Debezium user in PostgreSQL**

Create a user with the `LOGIN` and `REPLICATION` permissions:

```sql
CREATE ROLE debezium WITH LOGIN REPLICATION PASSWORD '<debeziumpassword>';
```

1. **Enable a replication spot for Debezium**

1. Create a table for Debezium to listen to:

```sql
CREATE TABLE accounts (created_at TIMESTAMPTZ DEFAULT NOW(),
name TEXT,
city TEXT);
```

1. Turn the table into a hypertable:

```sql
SELECT create_hypertable('accounts', 'created_at');
```

Debezium also works with [$CAGGs][caggs].

1. Create a publication and enable a replication slot:

```sql
CREATE PUBLICATION dbz_publication FOR ALL TABLES WITH (publish = 'insert, update');
```

[caggs]: /use-timescale/:currentVersion:/continuous-aggregates/
[run-queries]: /getting-started/:currentVersion:/run-queries-from-console/
[open-console]: https://console.cloud.timescale.com/dashboard/services
[psql-connect]: /integrations/:currentVersion:/psql/#connect-to-your-service
[debezium-replication-permissions]: https://debezium.io/documentation/reference/3.1/connectors/postgresql.html#postgresql-host-replication-permissions
7 changes: 7 additions & 0 deletions _partials/_integration-prereqs-self-only.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

To follow the steps on this page:

* Create a target [self-hosted $TIMESCALE_DB][enable-timescaledb] instance.


[enable-timescaledb]: /self-hosted/:currentVersion:/install/
1 change: 0 additions & 1 deletion _partials/_integration-prereqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ To follow the steps on this page:
You need [your connection details][connection-info]. This procedure also
works for [self-hosted $TIMESCALE_DB][enable-timescaledb].


[create-service]: /getting-started/:currentVersion:/services/
[enable-timescaledb]: /self-hosted/:currentVersion:/install/
[connection-info]: /integrations/:currentVersion:/find-connection-details/
58 changes: 2 additions & 56 deletions integrations/apache-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ keywords: [Apache Kafka, integrations]
---

import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";
import IntegrationApacheKafka from "versionContent/_partials/_integration-apache-kafka-install.mdx";

# Integrate Apache Kafka with $CLOUD_LONG

Expand All @@ -29,62 +30,7 @@ To install and configure Apache Kafka:

<Procedure>

1. **Extract the Kafka binaries to a local folder**

```bash
curl https://dlcdn.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz | tar -xzf -
cd kafka_2.13-3.9.0
```
From now on, the folder where you extracted the Kafka binaries is called `<KAFKA_HOME>`.

1. **Configure and run Apache Kafka**

```bash
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/kraft/reconfig-server.properties
./bin/kafka-server-start.sh config/kraft/reconfig-server.properties
```
Use the `-daemon` flag to run this process in the background.

1. **Create Kafka topics**

In another Terminal window, navigate to <KAFKA_HOME>, then call `kafka-topics.sh` and create the following topics:
- `accounts`: publishes JSON messages that are consumed by the timescale-sink connector and inserted into your $SERVICE_LONG.
- `deadletter`: stores messages that cause errors and that Kafka Connect workers cannot process.

```bash
./bin/kafka-topics.sh \
--create \
--topic accounts \
--bootstrap-server localhost:9092 \
--partitions 10

./bin/kafka-topics.sh \
--create \
--topic deadletter \
--bootstrap-server localhost:9092 \
--partitions 10
```

1. **Test that your topics are working correctly**
1. Run `kafka-console-producer` to send messages to the `accounts` topic:
```bash
bin/kafka-console-producer.sh --topic accounts --bootstrap-server localhost:9092
```
1. Send some events. For example, type the following:
```bash
>Timescale Cloud
>How Cool
```
2. In another Terminal window, navigate to <KAFKA_HOME>, then run `kafka-console-consumer` to consume the events you just sent:
```bash
bin/kafka-console-consumer.sh --topic accounts --from-beginning --bootstrap-server localhost:9092
```
You see
```bash
Timescale Cloud
How Cool
```
<IntegrationApacheKafka />

</Procedure>

Expand Down
Loading