2022-05-25 Update
- Add statement to drop replication slot to avoid making folder
pg_wal
becoming too large This happens when the old DB server is still being used after publication is removed
2021-01-07 Update
- Add Prevent Replication Error
- Update Setup New Server (Sequences) to append actual commands to run
2020-11-24 Update
Moved sequence fixing to the end (after subscription removed)
2020-11-19 Update
Wow I forgot to fix the sequences
Added back a section for it
I realize this after trying a few inserts on my staging environment
Main Article
Just notes for my own experience mostly
Upgrade from 12 to 13
If you don't know what's logical replication read some articles about it first
Official doc: https://www.postgresql.org/docs/12/logical-replication.html
I used to just upgrade PSQL like this:
pg_dump
-> pg_restore
to another instance -> Point processes (web, worker, etc.) to new DB
Data loss doesn't matter that much in my project but I rather have no data loss
Failed Attempts
If you are too busy or need the solution urgently you can skip this and come back later maybe.
Failed Attempts #1 - Replication after data restored
I used pg_restore
to restore data
After subscription started the server will complaint about duplicate data
Although it's possible use copy_data = false
but that means some data will be lost and make this replication meaningless
Failed Attempts #2 - Replication after table structure AND indexes restored
Used pg_restore
to restore with --schema
After subscription started the server will take forever to copy data from table with many indexes (waited 2 days still not done)
Solution
General Steps
This probably lack details
See Actual Notes on Test Run
for actual commands / queries
0.1. Setup new server
- Setup new host
- Deploy empty PG Server (without cron job that does backup)
- Backup schema (only if no latest db structure backup available)
- Restore schema & PK without indexes & constraints (section
pre-data
)
Config might have to be updated
- https://www.postgresql.org/docs/13/logical-replication-config.html
- https://www.postgresql.org/docs/13/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SUBSCRIBER
- https://www.postgresql.org/docs/13/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER
0.2. Setup old server
Config might have to be updated
- https://www.postgresql.org/docs/13/logical-replication-config.html
- https://www.postgresql.org/docs/13/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER
- https://www.postgresql.org/docs/13/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
1. Add Publication
In existing DB server
CREATE PUBLICATION {publication_name}
FOR ALL TABLES
2. Add Subscription
In new DB server
CREATE SUBSCRIPTION {subcription_name}
CONNECTION 'hostaddr={old_db_server_ip} port={old_db_server_port} user={old_db_server_username} password={old_db_server_password} dbname={old_db_name}'
PUBLICATION {publication_name}
3. Wait until synced
https://severalnines.com/database-blog/how-upgrade-postgresql-11-postgresql-12-zero-downtime
4.1. Setup New Server (Indexes)
- Restore indexes (section post-data)
4.2. Verify all indexes & constraints are restored
5. Update Users (web/worker process) to use new DB
Check things are working
6.1. Remove Subscription
In new DB server
DROP SUBSCRIPTION IF EXISTS {subcription_name}
6.2. Remove Publication
In old DB server
DROP PUBLICATION IF EXISTS {publication_name}
-- Prevents pg_wal folder getting very large in case old DB is still being used
SELECT pg_drop_replication_slot('{publication_name}');
Actual Notes on Test Run
Setup New Server (Data Structure)
You will see some docker commands since I use Docker container to run PSQL
sudo docker exec -it db.master.1 bash
curl -o backup.dump \
https://bucket.s3.ap-east-1.amazonaws.com/backup/postgresql/pg_dump/manual/2020/10/2020_xx_xx_xxxxxx.dump
PGPASSWORD=pa55w0rd pg_restore --verbose --clean --no-acl --no-owner --if-exists --section=pre-data --no-publications -h localhost -U useruser -d dbdbdbdb --jobs=$(nproc) ./backup.dump
PGPASSWORD=pa55w0rd psql -h localhost -U useruser -d dbdbdbdb
Prevent Replication Error
Errors like
logical replication target relation xxx has neither REPLICA IDENTITY index nor PRIMARY KEY and published relation does not have REPLICA IDENTITY FULL
is caused by not having PK since those are in section post-data
to be restored in Setup New Server (Indexes)
Thus we need to restore PK only
Which require custom SQL file created
References:
PGPASSWORD=pa55w0rd \
pg_dump \
-h localhost -U useruser -d dbdbdbdb \
--schema-only --section=post-data > dbdbdbdb_postdata.sql
grep -B 1 'PRIMARY KEY' dbdbdbdb_postdata.sql > dbdbdbdb_pkeys.sql
# Just text, so show content and copy/paste to new DB console
cat dbdbdbdb_pkeys.sql
Add Publication/Subscription
CREATE PUBLICATION publication_test_2020_10_21_1146
FOR ALL TABLES;
CREATE SUBSCRIPTION subcription_test_2020_10_21_1146
CONNECTION 'hostaddr=10.170.0.6 port=5432 user=useruser password=pa55w0rd dbname=dbdbdbdb'
PUBLICATION publication_test_2020_10_21_1146;
Monitor
- https://dba.stackexchange.com/questions/224490/using-postgresql-logical-replication-how-do-you-know-that-the-subscriber-is-cau
- https://dba.stackexchange.com/questions/262294/postgresql-logical-replication-initial-slot-snapshot-too-large
- https://www.postgresql.org/docs/13/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-VIEW
- https://programmer.help/blogs/about-replication-status-in-postgresql-logical-subscription.html
-- Publisher
SELECT pid, usename, application_name, state
, pg_current_wal_lsn() AS current_lsn
, state
, sync_state
, sent_lsn
, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) AS sent_diff
, write_lsn
, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn)) AS write_diff
, replay_lsn
, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_diff
--, write_lag
--, flush_lag
--, replay_lag
--, backend_start
--, reply_time
FROM pg_stat_replication
ORDER BY application_name, pid;
-- Subcriber
-- # Subscription Workers
SELECT pss.*, c.relname FROM pg_stat_subscription AS pss
LEFT OUTER JOIN pg_class AS c
ON pss.relid = c.oid;
-- # Subscription Workers on Tables (not ready, srsubstate <> 'r')
SELECT psr.*, c.relname FROM pg_subscription_rel AS psr
LEFT OUTER JOIN pg_class AS c
ON psr.srrelid = c.oid
WHERE srsubstate <> 'r';
-- # Subscription Workers on Tables (not ready or synced)
SELECT psr.*, c.relname FROM pg_subscription_rel AS psr
LEFT OUTER JOIN pg_class AS c
ON psr.srrelid = c.oid
WHERE srsubstate NOT IN ('r', 's');
-- Monitor Create Index (in case "Broken pipe")
\x on
SELECT pid
--, backend_start
, query_start
, state_change
, wait_event_type
, wait_event
, state
, query
--, backend_type
FROM pg_stat_activity
WHERE state = 'active';
Monitor network speed / errors
sudo nethogs
sudo docker logs --tail=100 --follow db.master.1
Setup New Server (Indexes)
sudo docker exec -it db.master.1 bash
PGPASSWORD=pa55w0rd pg_restore --verbose --clean --no-acl --no-owner --if-exists --section=post-data --no-publications -h localhost -U useruser -d dbdbdbdb --jobs=$(nproc) ./backup.dump
Verify
-- Index Count
SELECT COUNT(*)
FROM pg_class
WHERE relkind = 'i'
AND relname NOT LIKE 'pg_%'
AND relname NOT LIKE 'sql_%';
-- Table Count
SELECT COUNT(*)
FROM pg_class
WHERE relkind = 'r'
AND relname NOT LIKE 'pg_%'
AND relname NOT LIKE 'sql_%';
-- Constraint Count
SELECT COUNT(*), contype
FROM pg_constraint
GROUP BY contype;
-- Sequence Count
SELECT COUNT(*), seqtypid
FROM pg_sequence
GROUP BY seqtypid;
Remove Publication/Subscription
-- Safer to drop subscription first
DROP SUBSCRIPTION IF EXISTS subcription_test_2020_10_21_1146;
DROP PUBLICATION IF EXISTS publication_test_2020_10_21_1146;
-- Prevents pg_wal folder getting very large in case old DB is still being used
SELECT pg_drop_replication_slot('publication_test_2020_10_21_1146');
Setup New Server (Sequences)
This will not be restored since we cannot restore data via pg_restore --section=data
This can be done after subscription removed (no more inserts without calling nextval
on sequences) and before you start inserting records into any table with a sequence (In Ruby on Rails that's almost every table)
The method I tested:
https://wiki.postgresql.org/wiki/Fixing_Sequences
You might be interested in other answers:
touch reset.sql
tee -a reset.sql <<EOF
SELECT 'SELECT SETVAL(' ||
quote_literal(quote_ident(PGT.schemaname) || '.' || quote_ident(S.relname)) ||
', COALESCE(MAX(' ||quote_ident(C.attname)|| '), 1) ) FROM ' ||
quote_ident(PGT.schemaname)|| '.'||quote_ident(T.relname)|| ';'
FROM pg_class AS S,
pg_depend AS D,
pg_class AS T,
pg_attribute AS C,
pg_tables AS PGT
WHERE S.relkind = 'S'
AND S.oid = D.objid
AND D.refobjid = T.oid
AND D.refobjid = C.attrelid
AND D.refobjsubid = C.attnum
AND T.relname = PGT.tablename
ORDER BY S.relname;
EOF
# Maybe check the content
# cat reset.sql
PGPASSWORD=pa55w0rd psql -h localhost -U useruser -d dbdbdbdb -Atq -f reset.sql -o temp.sql
# Maybe check the content
# cat temp.sql
PGPASSWORD=pa55w0rd psql -h localhost -U useruser -d dbdbdbdb -f temp.sql
rm temp.sql
The End
If you feel like this looks like a mess
I am sorry but it is mainly for myself
Feel free to write another one that looks more organized ;)
Top comments (0)