Skip to content

Commit 17e07f5

Browse files
3804 docs rfc create a doc showing how to integrate debezium with timescale cloud (#4024)
1 parent 02a1f7f commit 17e07f5

10 files changed

+407
-58
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
1. **Extract the Kafka binaries to a local folder**
2+
3+
```bash
4+
curl https://dlcdn.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz | tar -xzf -
5+
cd kafka_2.13-3.9.0
6+
```
7+
From now on, the folder where you extracted the Kafka binaries is called `<KAFKA_HOME>`.
8+
9+
1. **Configure and run Apache Kafka**
10+
11+
```bash
12+
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
13+
./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/kraft/reconfig-server.properties
14+
./bin/kafka-server-start.sh config/kraft/reconfig-server.properties
15+
```
16+
Use the `-daemon` flag to run this process in the background.
17+
18+
1. **Create Kafka topics**
19+
20+
In another Terminal window, navigate to <KAFKA_HOME>, then call `kafka-topics.sh` and create the following topics:
21+
- `accounts`: publishes JSON messages that are consumed by the timescale-sink connector and inserted into your $SERVICE_LONG.
22+
- `deadletter`: stores messages that cause errors and that Kafka Connect workers cannot process.
23+
24+
```bash
25+
./bin/kafka-topics.sh \
26+
--create \
27+
--topic accounts \
28+
--bootstrap-server localhost:9092 \
29+
--partitions 10
30+
31+
./bin/kafka-topics.sh \
32+
--create \
33+
--topic deadletter \
34+
--bootstrap-server localhost:9092 \
35+
--partitions 10
36+
```
37+
38+
1. **Test that your topics are working correctly**
39+
1. Run `kafka-console-producer` to send messages to the `accounts` topic:
40+
```bash
41+
bin/kafka-console-producer.sh --topic accounts --bootstrap-server localhost:9092
42+
```
43+
1. Send some events. For example, type the following:
44+
```bash
45+
>Timescale Cloud
46+
>How Cool
47+
```
48+
1. In another Terminal window, navigate to <KAFKA_HOME>, then run `kafka-console-consumer` to consume the events you just sent:
49+
```bash
50+
bin/kafka-console-consumer.sh --topic accounts --from-beginning --bootstrap-server localhost:9092
51+
```
52+
You see
53+
```bash
54+
Timescale Cloud
55+
How Cool
56+
```
57+
58+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
2+
3+
1. **Connect to your $SERVICE_LONG**
4+
5+
For $CLOUD_LONG, open an [SQL editor][run-queries] in [$CONSOLE][open-console]. For self-hosted, use [`psql`][psql].
6+
7+
1. **Enable logical replication for your $SERVICE_LONG**
8+
9+
1. Run the following command to enable logical replication:
10+
11+
```sql
12+
ALTER SYSTEM SET wal_level = logical;
13+
SELECT pg_reload_conf();
14+
```
15+
16+
1. Restart your $SERVICE_SHORT.
17+
18+
1. **Create a table**
19+
20+
Create a table to test the integration. For example:
21+
22+
```sql
23+
CREATE TABLE sensor_data (
24+
id SERIAL PRIMARY KEY,
25+
device_id TEXT NOT NULL,
26+
temperature FLOAT NOT NULL,
27+
recorded_at TIMESTAMPTZ DEFAULT now()
28+
);
29+
```
+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
2+
1. **Run Zookeeper in Docker**
3+
4+
In another Terminal window, run the following command:
5+
```bash
6+
docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper:3.0
7+
```
8+
Check the output log to see that zookeeper is running.
9+
10+
1. **Run Kafka in Docker**
11+
12+
In another Terminal window, run the following command:
13+
```bash
14+
docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper quay.io/debezium/kafka:3.0
15+
```
16+
Check the output log to see that Kafka is running.
17+
18+
19+
1. **Run Kafka Connect in Docker**
20+
21+
In another Terminal window, run the following command:
22+
```bash
23+
docker run -it --rm --name connect \
24+
-p 8083:8083 \
25+
-e GROUP_ID=1 \
26+
-e CONFIG_STORAGE_TOPIC=accounts \
27+
-e OFFSET_STORAGE_TOPIC=offsets \
28+
-e STATUS_STORAGE_TOPIC=storage \
29+
--link kafka:kafka \
30+
--link timescaledb:timescaledb \
31+
quay.io/debezium/connect:3.0
32+
```
33+
Check the output log to see that Kafka Connect is running.
34+
35+
36+
1. **Register the Debezium PostgreSQL source connector**
37+
38+
Update the `<properties>` for the `<debezium-user>` you created in your $SELF_LONG instance in the following command.
39+
Then run the command in another Terminal window:
40+
```bash
41+
curl -X POST http://localhost:8083/connectors \
42+
-H "Content-Type: application/json" \
43+
-d '{
44+
"name": "timescaledb-connector",
45+
"config": {
46+
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
47+
"database.hostname": "timescaledb",
48+
"database.port": "5432",
49+
"database.user": "<debezium-user>",
50+
"database.password": "<debezium-password>",
51+
"database.dbname" : "postgres",
52+
"topic.prefix": "accounts",
53+
"plugin.name": "pgoutput",
54+
"schema.include.list": "public,_timescaledb_internal",
55+
"transforms": "timescaledb",
56+
"transforms.timescaledb.type": "io.debezium.connector.postgresql.transforms.timescaledb.TimescaleDb",
57+
"transforms.timescaledb.database.hostname": "timescaledb",
58+
"transforms.timescaledb.database.port": "5432",
59+
"transforms.timescaledb.database.user": "<debezium-user>",
60+
"transforms.timescaledb.database.password": "<debezium-password>",
61+
"transforms.timescaledb.database.dbname": "postgres"
62+
}
63+
}'
64+
```
65+
66+
1. **Verify `timescaledb-source-connector` is included in the connector list**
67+
68+
1. Check the tasks associated with `timescaledb-connector`:
69+
```bash
70+
curl -i -X GET -H "Accept:application/json" localhost:8083/connectors/timescaledb-connector
71+
```
72+
You see something like:
73+
```bash
74+
{"name":"timescaledb-connector","config":
75+
{ "connector.class":"io.debezium.connector.postgresql.PostgresConnector",
76+
"transforms.timescaledb.database.hostname":"timescaledb",
77+
"transforms.timescaledb.database.password":"debeziumpassword","database.user":"debezium",
78+
"database.dbname":"postgres","transforms.timescaledb.database.dbname":"postgres",
79+
"transforms.timescaledb.database.user":"debezium",
80+
"transforms.timescaledb.type":"io.debezium.connector.postgresql.transforms.timescaledb.TimescaleDb",
81+
"transforms.timescaledb.database.port":"5432","transforms":"timescaledb",
82+
"schema.include.list":"public,_timescaledb_internal","database.port":"5432","plugin.name":"pgoutput",
83+
"topic.prefix":"accounts","database.hostname":"timescaledb","database.password":"debeziumpassword",
84+
"name":"timescaledb-connector"},"tasks":[{"connector":"timescaledb-connector","task":0}],"type":"source"}
85+
```
86+
87+
1. **Verify `timescaledb-connector` is running**
88+
89+
1. Open the Terminal window running Kafka Connect. When the connector is active, you see something like the following:
90+
91+
```bash
92+
2025-04-30 10:40:15,168 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal._hyper_1_1_chunk' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
93+
2025-04-30 10:40:15,168 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_job_stat' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
94+
2025-04-30 10:40:15,175 INFO Postgres|accounts|streaming SignalProcessor started. Scheduling it every 5000ms [io.debezium.pipeline.signal.SignalProcessor]
95+
2025-04-30 10:40:15,175 INFO Postgres|accounts|streaming Creating thread debezium-postgresconnector-accounts-SignalProcessor [io.debezium.util.Threads]
96+
2025-04-30 10:40:15,175 INFO Postgres|accounts|streaming Starting streaming [io.debezium.pipeline.ChangeEventSourceCoordinator]
97+
2025-04-30 10:40:15,176 INFO Postgres|accounts|streaming Retrieved latest position from stored offset 'LSN{0/1FCE570}' [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
98+
2025-04-30 10:40:15,176 INFO Postgres|accounts|streaming Looking for WAL restart position for last commit LSN 'null' and last change LSN 'LSN{0/1FCE570}' [io.debezium.connector.postgresql.connection.WalPositionLocator]
99+
2025-04-30 10:40:15,176 INFO Postgres|accounts|streaming Initializing PgOutput logical decoder publication [io.debezium.connector.postgresql.connection.PostgresReplicationConnection]
100+
2025-04-30 10:40:15,189 INFO Postgres|accounts|streaming Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=LSN{0/1FCCFF0}, catalogXmin=884] [io.debezium.connector.postgresql.connection.PostgresConnection]
101+
2025-04-30 10:40:15,189 INFO Postgres|accounts|streaming Connection gracefully closed [io.debezium.jdbc.JdbcConnection]
102+
2025-04-30 10:40:15,204 INFO Postgres|accounts|streaming Requested thread factory for component PostgresConnector, id = accounts named = keep-alive [io.debezium.util.Threads]
103+
2025-04-30 10:40:15,204 INFO Postgres|accounts|streaming Creating thread debezium-postgresconnector-accounts-keep-alive [io.debezium.util.Threads]
104+
2025-04-30 10:40:15,216 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_policy_chunk_stats' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
105+
2025-04-30 10:40:15,216 INFO Postgres|accounts|streaming REPLICA IDENTITY for 'public.accounts' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
106+
2025-04-30 10:40:15,217 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_job_stat_history' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
107+
2025-04-30 10:40:15,217 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal._hyper_1_1_chunk' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
108+
2025-04-30 10:40:15,217 INFO Postgres|accounts|streaming REPLICA IDENTITY for '_timescaledb_internal.bgw_job_stat' is 'DEFAULT'; UPDATE and DELETE events will contain previous values only for PK columns [io.debezium.connector.postgresql.PostgresSchema]
109+
2025-04-30 10:40:15,219 INFO Postgres|accounts|streaming Processing messages [io.debezium.connector.postgresql.PostgresStreamingChangeEventSource]
110+
```
111+
112+
1. Watch the events in the accounts topic on your $SELF_LONG instance.
113+
114+
In another Terminal instance, run the following command:
115+
116+
```bash
117+
docker run -it --rm --name watcher --link zookeeper:zookeeper --link kafka:kafka quay.io/debezium/kafka:3.0 watch-topic -a -k accounts
118+
```
119+
120+
You see the topics being streamed. For example:
121+
122+
```bash
123+
status-task-timescaledb-connector-0 {"state":"RUNNING","trace":null,"worker_id":"172.17.0.5:8083","generation":31}
124+
status-topic-timescaledb.public.accounts:connector-timescaledb-connector {"topic":{"name":"timescaledb.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009337985}}
125+
status-topic-accounts._timescaledb_internal.bgw_job_stat:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338118}}
126+
status-topic-accounts._timescaledb_internal.bgw_job_stat:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338120}}
127+
status-topic-accounts._timescaledb_internal.bgw_job_stat_history:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat_history","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338243}}
128+
status-topic-accounts._timescaledb_internal.bgw_job_stat_history:connector-timescaledb-connector {"topic":{"name":"accounts._timescaledb_internal.bgw_job_stat_history","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338245}}
129+
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338250}}
130+
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
131+
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
132+
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
133+
status-topic-accounts.public.accounts:connector-timescaledb-connector {"topic":{"name":"accounts.public.accounts","connector":"timescaledb-connector","task":0,"discoverTimestamp":1746009338251}}
134+
["timescaledb-connector",{"server":"accounts"}] {"last_snapshot_record":true,"lsn":33351024,"txId":893,"ts_usec":1746009337290783,"snapshot":"INITIAL","snapshot_completed":true}
135+
status-connector-timescaledb-connector {"state":"UNASSIGNED","trace":null,"worker_id":"172.17.0.5:8083","generation":31}
136+
status-task-timescaledb-connector-0 {"state":"UNASSIGNED","trace":null,"worker_id":"172.17.0.5:8083","generation":31}
137+
status-connector-timescaledb-connector {"state":"RUNNING","trace":null,"worker_id":"172.17.0.5:8083","generation":33}
138+
status-task-timescaledb-connector-0 {"state":"RUNNING","trace":null,"worker_id":"172.17.0.5:8083","generation":33}
139+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
2+
1. **Configure your self-hosted $PG deployment**
3+
4+
1. Open `postgresql.conf`.
5+
6+
The $PG configuration files are usually located in:
7+
8+
- Docker: `/home/postgres/pgdata/data/`
9+
- Linux: `/etc/postgresql/<version>/main/` or `/var/lib/pgsql/<version>/data/`
10+
- MacOS: `/opt/homebrew/var/postgresql@<version>/`
11+
- Windows: `C:\Program Files\PostgreSQL\<version>\data\`
12+
13+
1. Enable logical replication.
14+
15+
Modify the following settings in `postgresql.conf`:
16+
17+
```ini
18+
wal_level = logical
19+
max_replication_slots = 10
20+
max_wal_senders = 10
21+
```
22+
23+
1. Open `pg_hba.conf` and enable host replication.
24+
25+
To allow replication connections, add the following:
26+
27+
```
28+
local replication debezium trust
29+
```
30+
This permission is for the `debezium` $PG user running on a local or Docker deployment. For more about replication
31+
permissions, see [Configuring PostgreSQL to allow replication with the Debezium connector host][debezium-replication-permissions].
32+
33+
1. Restart $PG.
34+
35+
36+
1. **Connect to your $SELF_LONG instance**
37+
38+
Use [`psql`][psql-connect].
39+
40+
1. **Create a Debezium user in PostgreSQL**
41+
42+
Create a user with the `LOGIN` and `REPLICATION` permissions:
43+
44+
```sql
45+
CREATE ROLE debezium WITH LOGIN REPLICATION PASSWORD '<debeziumpassword>';
46+
```
47+
48+
1. **Enable a replication spot for Debezium**
49+
50+
1. Create a table for Debezium to listen to:
51+
52+
```sql
53+
CREATE TABLE accounts (created_at TIMESTAMPTZ DEFAULT NOW(),
54+
name TEXT,
55+
city TEXT);
56+
```
57+
58+
1. Turn the table into a hypertable:
59+
60+
```sql
61+
SELECT create_hypertable('accounts', 'created_at');
62+
```
63+
64+
Debezium also works with [$CAGGs][caggs].
65+
66+
1. Create a publication and enable a replication slot:
67+
68+
```sql
69+
CREATE PUBLICATION dbz_publication FOR ALL TABLES WITH (publish = 'insert, update');
70+
```
71+
72+
[caggs]: /use-timescale/:currentVersion:/continuous-aggregates/
73+
[run-queries]: /getting-started/:currentVersion:/run-queries-from-console/
74+
[open-console]: https://console.cloud.timescale.com/dashboard/services
75+
[psql-connect]: /integrations/:currentVersion:/psql/#connect-to-your-service
76+
[debezium-replication-permissions]: https://debezium.io/documentation/reference/3.1/connectors/postgresql.html#postgresql-host-replication-permissions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
2+
To follow the steps on this page:
3+
4+
* Create a target [self-hosted $TIMESCALE_DB][enable-timescaledb] instance.
5+
6+
7+
[enable-timescaledb]: /self-hosted/:currentVersion:/install/

_partials/_integration-prereqs.md

-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ To follow the steps on this page:
55
You need [your connection details][connection-info]. This procedure also
66
works for [self-hosted $TIMESCALE_DB][enable-timescaledb].
77

8-
98
[create-service]: /getting-started/:currentVersion:/services/
109
[enable-timescaledb]: /self-hosted/:currentVersion:/install/
1110
[connection-info]: /integrations/:currentVersion:/find-connection-details/

integrations/apache-kafka.md

+2-56
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ keywords: [Apache Kafka, integrations]
66
---
77

88
import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";
9+
import IntegrationApacheKafka from "versionContent/_partials/_integration-apache-kafka-install.mdx";
910

1011
# Integrate Apache Kafka with $CLOUD_LONG
1112

@@ -29,62 +30,7 @@ To install and configure Apache Kafka:
2930

3031
<Procedure>
3132

32-
1. **Extract the Kafka binaries to a local folder**
33-
34-
```bash
35-
curl https://dlcdn.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz | tar -xzf -
36-
cd kafka_2.13-3.9.0
37-
```
38-
From now on, the folder where you extracted the Kafka binaries is called `<KAFKA_HOME>`.
39-
40-
1. **Configure and run Apache Kafka**
41-
42-
```bash
43-
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
44-
./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/kraft/reconfig-server.properties
45-
./bin/kafka-server-start.sh config/kraft/reconfig-server.properties
46-
```
47-
Use the `-daemon` flag to run this process in the background.
48-
49-
1. **Create Kafka topics**
50-
51-
In another Terminal window, navigate to <KAFKA_HOME>, then call `kafka-topics.sh` and create the following topics:
52-
- `accounts`: publishes JSON messages that are consumed by the timescale-sink connector and inserted into your $SERVICE_LONG.
53-
- `deadletter`: stores messages that cause errors and that Kafka Connect workers cannot process.
54-
55-
```bash
56-
./bin/kafka-topics.sh \
57-
--create \
58-
--topic accounts \
59-
--bootstrap-server localhost:9092 \
60-
--partitions 10
61-
62-
./bin/kafka-topics.sh \
63-
--create \
64-
--topic deadletter \
65-
--bootstrap-server localhost:9092 \
66-
--partitions 10
67-
```
68-
69-
1. **Test that your topics are working correctly**
70-
1. Run `kafka-console-producer` to send messages to the `accounts` topic:
71-
```bash
72-
bin/kafka-console-producer.sh --topic accounts --bootstrap-server localhost:9092
73-
```
74-
1. Send some events. For example, type the following:
75-
```bash
76-
>Timescale Cloud
77-
>How Cool
78-
```
79-
2. In another Terminal window, navigate to <KAFKA_HOME>, then run `kafka-console-consumer` to consume the events you just sent:
80-
```bash
81-
bin/kafka-console-consumer.sh --topic accounts --from-beginning --bootstrap-server localhost:9092
82-
```
83-
You see
84-
```bash
85-
Timescale Cloud
86-
How Cool
87-
```
33+
<IntegrationApacheKafka />
8834

8935
</Procedure>
9036

0 commit comments

Comments
 (0)