Here are the main questions explored:
- There are various Postgres compatible solutions built with scalibility in mind (Citus, AWS Aurora, CockroachDB), but what about plain old Postgres?
- Are we able to let a Postgres data layer of new services start small and simple and grow in complexity just-in-time if needed?
- How much of the data layer complexity can be done transparently to application developers?
Here is an example scenario:
- A new service starts with a small, simple, easy to work with database - a single schema, no partitions, no sharding, using just a column in each table to specify the tenant for multi-tenancy
- As the service evolves, various schema changes may happen
- As the service sees more usage, more and more data is inserted
- At some point, the database has grown large enough such that we'd want to scale it somehow. (Vertical scaling is probably the simplest thing we'd reach for. Though, at some point, if the service continues to grow - we'd likely want to look at our horizontal scaling options)
- We horizontally scale the database
- Ideally, this change can be transparent to the app. No code changes. Schema changes can continue to happen as the service continues to evolve
- Repeat from Step 2
Bring up two Postgres containers - db_1 and db_2
docker-compose upLog into db_1
psql -h localhost -p 5432 -U experiment
# when prompted, enter the password: experimentFirst, create a standard table
CREATE TABLE exercises (
id UUID PRIMARY KEY,
tenant VARCHAR NOT NULL,
name VARCHAR NOT NULL
);Insert some data
INSERT INTO exercises (id, tenant, name) VALUES ('ac8e6a36-722e-4e53-b9e6-e916aded0b00', 'Acme', 'jumping');
INSERT INTO exercises (id, tenant, name) VALUES ('284d1166-0bdc-4c66-93c1-43ddbd0491a0', 'Acme', 'running');
INSERT INTO exercises (id, tenant, name) VALUES ('aa46f49b-5fca-4fc8-bd63-fdf587710c00', 'Bobmart', 'hopping');
INSERT INTO exercises (id, tenant, name) VALUES ('b9257460-173b-45e8-b4c3-7ec269c8f74d', 'Crazytown', 'skipping');
INSERT INTO exercises (id, tenant, name) VALUES ('4f32eadf-0abe-480a-93bc-8780fdf85b5d', 'Dory', 'swimming');
INSERT INTO exercises (id, tenant, name) VALUES ('216b0fd6-5baf-4432-80c6-069243e8cba4', 'Eggbert', 'sitting');Now we'll convert the table to a partitioned table via Postgres Declarative Table Partitioning.
We'll start with just a single partition on the same server.
BEGIN;
ALTER TABLE exercises RENAME TO exercises_old;
CREATE TABLE exercises (LIKE exercises_old) -- TODO: look into various LIKE INCLUDING options
PARTITION BY RANGE ( tenant );
ALTER TABLE exercises
ATTACH PARTITION exercises_old DEFAULT;
COMMIT;Let's see what's in each table now.
In the main partitioned table.
SELECT * FROM exercises;
id | tenant | name
--------------------------------------+-----------+----------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sittingIn the single partition.
SELECT * FROM exercises_old;
id | tenant | name
--------------------------------------+-----------+----------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sittingNow let's actually start setting up multiple partitions and moving data over to them
BEGIN;
CREATE TABLE exercises_0 (
LIKE exercises_old INCLUDING ALL
);
-- NOTE: these rows will be locked for the duration of the move,
-- so care will need to be taken if this moves a lot of data in practice
WITH x AS (
DELETE FROM exercises_old WHERE tenant < 'B' returning *
)
INSERT INTO exercises_0
SELECT * FROM x;
ALTER TABLE exercises
ATTACH PARTITION exercises_0 FOR VALUES FROM (MINVALUE) TO ('B');
COMMIT;Now our exercises table is comprised of two partions - exercises_old and exercises_0.
SELECT * FROM exercises;
id | tenant | name
--------------------------------------+-----------+----------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting
SELECT * FROM exercises_old;
id | tenant | name
--------------------------------------+-----------+----------
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting
SELECT * FROM exercises_0;
id | tenant | name
--------------------------------------+--------+---------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | runningTable modifications still work on the partitioned table, with modifications propagating to the partitions
ALTER TABLE exercises ADD description VARCHAR;SELECT * FROM exercises;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_old;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_0;
id | tenant | name | description
--------------------------------------+--------+---------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |If we want to break up the partitioning operation into chunks that deal with smaller amounts of data, we can write them as separate operations that operate on a subset of the data each time.
Move some of the data
BEGIN;
CREATE TABLE exercises_1 (
LIKE exercises_old INCLUDING ALL
);
WITH x AS (
DELETE FROM exercises_old WHERE tenant < 'C' returning *
)
INSERT INTO exercises_1
SELECT * FROM x;
ALTER TABLE exercises
ATTACH PARTITION exercises_1 FOR VALUES FROM ('B') TO ('C');
COMMIT;Table exercises_1 should have some of the data now.
SELECT * FROM exercises;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_old;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_0;
id | tenant | name | description
--------------------------------------+--------+---------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
SELECT * FROM exercises_1;
id | tenant | name | description
--------------------------------------+---------+---------+-------------
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |Move the rest
BEGIN;
ALTER TABLE exercises DETACH PARTITION exercises_1;
WITH x AS (
DELETE FROM exercises_old WHERE tenant < 'D' returning *
)
INSERT INTO exercises_1
SELECT * FROM x;
ALTER TABLE exercises
ATTACH PARTITION exercises_1 FOR VALUES FROM ('B') TO ('D');
COMMIT;Table exercises_1 should have more of the data now.
SELECT * FROM exercises;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_old;
id | tenant | name | description
--------------------------------------+---------+----------+-------------
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_0;
id | tenant | name | description
--------------------------------------+--------+---------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
SELECT * FROM exercises_1;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |Let's see how Postgres plans different types of queries.
This scans the tables in all paritions.
EXPLAIN SELECT * FROM exercises;
QUERY PLAN
------------------------------------------------------------------------
Append (cost=0.00..56.10 rows=1740 width=112)
-> Seq Scan on exercises_0 (cost=0.00..15.80 rows=580 width=112)
-> Seq Scan on exercises_1 (cost=0.00..15.80 rows=580 width=112)
-> Seq Scan on exercises_old (cost=0.00..15.80 rows=580 width=112)This searches all of the tables in all partitions.
EXPLAIN SELECT * FROM exercises WHERE id = '284d1166-0bdc-4c66-93c1-43ddbd0491a0';
QUERY PLAN
--------------------------------------------------------------------------------------------
Append (cost=0.15..24.52 rows=3 width=112)
-> Index Scan using exercises_0_pkey on exercises_0 (cost=0.15..8.17 rows=1 width=112)
Index Cond: (id = '284d1166-0bdc-4c66-93c1-43ddbd0491a0'::uuid)
-> Index Scan using exercises_1_pkey on exercises_1 (cost=0.15..8.17 rows=1 width=112)
Index Cond: (id = '284d1166-0bdc-4c66-93c1-43ddbd0491a0'::uuid)
-> Index Scan using exercises_pkey on exercises_old (cost=0.15..8.17 rows=1 width=112)
Index Cond: (id = '284d1166-0bdc-4c66-93c1-43ddbd0491a0'::uuid)When given enough information in the query, Postgres is able to be smart and only execute the query in the relevant partition.
EXPLAIN SELECT * FROM exercises WHERE id = '284d1166-0bdc-4c66-93c1-43ddbd0491a0' AND tenant = 'Acme';
QUERY PLAN
--------------------------------------------------------------------------------------------
Append (cost=0.15..8.18 rows=1 width=112)
-> Index Scan using exercises_0_pkey on exercises_0 (cost=0.15..8.17 rows=1 width=112)
Index Cond: (id = '284d1166-0bdc-4c66-93c1-43ddbd0491a0'::uuid)
Filter: ((tenant)::text = 'Acme'::text)Now let's set up a partition that references a table on remote server using Postgres Foreign Data Wrapper (postgres_fdw).
On the primary server, db_1.
Install the postgres_fdw extension.
CREATE EXTENSION postgres_fdw;Create a foreign server object to represent the remote database you want to connect to.
CREATE SERVER db_2
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'db_2', port '5432', dbname 'experiment');Create a user mapping for each database user you want to allow to access the foreign server.
CREATE USER MAPPING FOR experiment
SERVER db_2
OPTIONS (user 'experiment', password 'experiment');Connect to db_2.
psql -h localhost -p 5433 -U experimentOn db_2, create a table that matches the schema of the partition table
-- on db_2
CREATE TABLE exercises_2 (
id UUID PRIMARY KEY,
tenant VARCHAR NOT NULL,
name VARCHAR NOT NULL,
description VARCHAR
);
-- TODO: look into a better way to get the schema from the primary tableOn db_1.
Create a foreign table for the remote table, move data over to it, and attach it as a partition
-- on db_1
BEGIN;
-- create a foreign table
IMPORT FOREIGN SCHEMA public LIMIT TO (exercises_2)
FROM SERVER db_2 INTO public;
-- move data
WITH x AS (
DELETE FROM exercises_old WHERE tenant < 'E' returning *
)
INSERT INTO exercises_2
SELECT * FROM x;
-- attach as partition
ALTER TABLE exercises
ATTACH PARTITION exercises_2 FOR VALUES FROM ('D') TO ('E');
COMMIT;The data is now in the table on the remote server, queries to the main table will include the data on the remote server
SELECT * FROM exercises;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_old;
id | tenant | name | description
--------------------------------------+---------+---------+-------------
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting |
SELECT * FROM exercises_0;
id | tenant | name | description
--------------------------------------+--------+---------+-------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running |
SELECT * FROM exercises_1;
id | tenant | name | description
--------------------------------------+-----------+----------+-------------
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping |
SELECT * FROM exercises_2;
id | tenant | name | description
--------------------------------------+--------+----------+-------------
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming |Let's see if table alterations continue to propagate out from the primary partitioned table
ALTER TABLE exercises ADD difficulty SMALLINT;Unfortunately, the foreign table is not altered
SELECT * FROM exercises;
-- ERROR: column "difficulty" does not exist
-- CONTEXT: remote SQL command: SELECT id, tenant, name, description, difficulty FROM public.exercises_2To fix this, we'll need to alter the table on the remote server, too.
On db_2.
-- on db_2
ALTER TABLE exercises_2 ADD difficulty SMALLINT;Now the query should work as expected.
-- on db_1
SELECT * FROM exercises;
id | tenant | name | description | difficulty
--------------------------------------+-----------+----------+-------------+------------
ac8e6a36-722e-4e53-b9e6-e916aded0b00 | Acme | jumping | |
284d1166-0bdc-4c66-93c1-43ddbd0491a0 | Acme | running | |
aa46f49b-5fca-4fc8-bd63-fdf587710c00 | Bobmart | hopping | |
b9257460-173b-45e8-b4c3-7ec269c8f74d | Crazytown | skipping | |
4f32eadf-0abe-480a-93bc-8780fdf85b5d | Dory | swimming | |
216b0fd6-5baf-4432-80c6-069243e8cba4 | Eggbert | sitting | |Let's try to create a foreign key reference to our partitioned table
CREATE TABLE exercisers (
id UUID PRIMARY KEY,
tenant VARCHAR NOT NULL,
exercise_id UUID NOT NULL,
FOREIGN KEY(exercise_id) REFERENCES exercises(id)
);
ERROR: cannot reference partitioned table "exercises"Let's try to create a foreign key reference from our partitioned table
CREATE TABLE tenants (
tenant VARCHAR NOT NULL PRIMARY KEY
);
INSERT into tenants SELECT DISTINCT tenant from exercises;
ALTER TABLE exercises
ADD CONSTRAINT fk_exercises_tenants FOREIGN KEY (tenant) REFERENCES tenants(tenant);
ERROR: "exercises_2" is a foreign table
DETAIL: Foreign tables cannot have constraint triggers.Note: When leveraging table partitioning without foreign tables - foreign key references from a partitioned table to some other table are supported. But foreign keys referencing partitioned tables are still not supported.
See https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS
- test out transactions across foreign server with postgres_fdw
- try out different partitioning schemes - LIST, HASH
- examine performance for partitioning, FDW
- look into improving dev experience around
ALTER TABLElimitation with foreign tables. Maybe leverage a trigger as a workaround?