Overview
Over the weekend, our Caprover Instance (open-source heroku alternative) crashed inexplicably. After lots of digging, we found out that there was a corrupted file which prevented the swarm from starting. The error we got was: can't initialize raft node: irreparable WAL error: wal: max entry size limit exceeded
. This is the equivalent to the Windows Blue Screen of Death... This docker instance was beyond repair. We started to investigate and found out that we had a 40gb Click House volume from Plausible.
It turned out that query logging had inflated our database size to 40GB, whereas the actual data was just 1GB.
Important Note: This was fixed by the Plausible team a long time ago, however we supported (sponsored) and used the project right from the start, so our instance had the logging issue. To be able to save our data, we needed to move to a new Plausible install on a new server. (Migrating to Hetzner along the way).
We determined that these were the needed steps: export data from ClickHouse and PostgreSQL, transfer it securely, and import it into a new environment.
Easier said than done.... A weekend later, with lots of Stack Overflow and GPT-4 help, we managed to migrate over without any data loss. The below is a tutorial we wrote for our internal knowledge base.
Step 1: Exporting Data
A. Export from ClickHouse
- Access ClickHouse Container:
docker exec -it <clickhouse-container-id> clickhouse-client
-
Check Database Size:
- To list all databases and their sizes:
SELECT database, formatReadableSize(sum(bytes_on_disk)) as total_size_on_disk FROM system.parts GROUP BY database ORDER BY sum(bytes_on_disk) DESC;
-
For detailed table sizes within a specific database:
SELECT table AS table_name, formatReadableSize(sum(bytes_on_disk)) AS size_on_disk FROM system.parts WHERE database = 'your_database_name' GROUP BY table_name ORDER BY size_on_disk DESC;
==Note: Here we found that the clickhouse database was the culprit and had over 30gb in query logs.==
- Export Individual Tables: For each table in your database, export it as a CSV file. Repeat this process for every table you wish to export:
clickhouse-client --query="SELECT * FROM your_database_name.your_table_name FORMAT CSV" > your_table_name.csv
Optionally, compress the CSV files into a single archive for convenience using tar
:
tar -czvf your_database_backup.tar.gz *.csv
B. Export from PostgreSQL
-
Direct Export to Host:
Use
docker exec
to runpg_dump
within the PostgreSQL container, saving the output directly to the host machine:
docker exec <postgres_container_id> pg_dump -U postgres your_database_name > /path/on/host/backupfile.sql
Step 2: Download via SSH :
For us, the easiest way to grab the CSV files from ClickHouse and the sql file, was to copy them over using scp
.
CSV Folder (Recursive):
scp -r your_username@remote_host:/path/to/csv_folder /local/directory
Single File:
scp your_username@remote_host:/path/to/backupfile.sql /local/directory
Step 3: Creating A New Plausible Instance
With our data safely stored on our local machine, it was time to spin up a new instance of Plausible. For this, I recommend the official Community Edition Guide from GitHub.
Important Note: Do not create a new account. Once you're on the registration page, it's time to import our data.
Step 3: Importing Data
A. Stop Plausible Docker Service
It's a good idea to pause the main Plausible container here in order to avoid any type of data corruption.
B. Importing the Postgres DB
-
Access Postgres:
First, access the PostgreSQL command line interface within your Docker container. Replace
<container_name_or_id>
with the name or ID of your PostgreSQL container:
docker exec -it <container_name_or_id> psql -U postgres
- Drop the Existing Database Warning: Dropping a database will permanently delete it and all data contained within. Ensure you have backups if necessary.
From within the PostgreSQL CLI, run:
DROP DATABASE plausible_db;
If the database is being accessed by other users, you might encounter an error. To force the database to drop, you can disconnect all connected users by running:
SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'plausible_db';
DROP DATABASE plausible_db;
Then, exit the PostgreSQL CLI:
\q
- Create a New Database: Still, within the Docker container's shell, create a new database with the same name:
docker exec -it <container_name_or_id> createdb -U postgres plausible_db
- Import the SQL File:
Now, let's import the SQL file into the newly created database. On your host machine, run:
cat /path/to/your/plausible_backup.sql | docker exec -i <container_name_or_id> psql -U postgres -d plausible_db
Replace /path/to/your/plausible_backup.sql
with the actual path to your SQL file. This command streams the SQL file into the psql
command running inside your Docker container, importing the data into your plausible_db
database.
Additional Notes & Handling Issues :
- Ensure that the SQL file contains the necessary commands to create tables and insert data. If it was generated by
pg_dump
, it should be fine. If your SQL file is particularly large, the import process might take some time. Monitor the process and check for any errors in the output.
Role Does Not Exist Error: When importing into PostgreSQL we faced a "role does not exist" error. This was because the version we used had the user plausible, while the new one used postgres. Modify the SQL dump file (in any text editor) to replace
OWNER TO plausible
withOWNER TO postgres
.
C. Check ClickHouse DB and Structure
- Access the ClickHouse Client
Initiate an interactive session with your ClickHouse container to access the ClickHouse client. Replace <container_name_or_id>
with your container's actual name or ID:
docker exec -it <container_name_or_id> clickhouse-client
- Select Your Database Switch to your target database to ensure subsequent commands apply to it:
USE plausible_events_db;
- ** List All Tables** Display all tables within your selected database:
SHOW TABLES;
-
Examine Table Structure
For details and to check that nothing changed on a specific table's structure, use the
DESCRIBE TABLE
command:
DESCRIBE TABLE events;
Or more succinctly:
DESC events;
-
Query Table Data (Optional)
To query data from a particular table, execute a
SELECT
statement:
SELECT * FROM events LIMIT 10;
D. Importing CSV Data into ClickHouse
Importing CSV into Clickhouse is done by one simple command per CSV.
You need to repeat this for every csv that you want to import. For us, these were the non-empty csv files from the export:
- events_v2
- events
- sessions_v2
- sessions
- ingest_counters
- schema_migrations
docker exec -i <container_name_or_id> bash -c "clickhouse-client --query=\"INSERT INTO plausible_events_db.TABLENAME FORMAT CSV\" < /path/in/container/CSVFILE.csv"
Replace <container_name_or_id>
and TABLENAME
and CSVFILE
with your container's name or ID and the CSV file's name and table.
Verifying Data Import
Ensure your data was accurately imported by executing a few checks:
-
Count Imported Rows
Verify the total row count in the
events_v2
table:
docker exec -it <container_name_or_id> clickhouse-client --query="SELECT COUNT(*) FROM plausible_events_db.events_v2;"
- Inspect Initial Rows Look at the first few rows to confirm the data appears as expected:
docker exec -it <container_name_or_id> clickhouse-client --query="SELECT * FROM plausible_events_db.events_v2 LIMIT 10;"
-
Check for Specific Data
If looking for particular data, such as a specific
event_id
, tailor a query to verify its presence:
docker exec -it <container_name_or_id> clickhouse-client --query="SELECT * FROM plausible_events_db.events_v2 WHERE event_id = 'expected_event_id' LIMIT 1;"
Final Steps
- Restart Plausible Container: After completing the imports, restart the Plausible container to initiate database connections and migrations. This can take 10-15 seconds.
- Verification: Log in with your previous credentials to verify that all data has been successfully migrated.
Conclusion
I hope that this guide provided a structured approach to exporting, transferring, and importing database data for Plausible analytics, including troubleshooting common issues. Ensure you have backups and verify data integrity at each step to ensure a smooth transition.
Top comments (1)
Thanks for sharing this knowledge, it came in super handy as I was moving over my Plausible instance from digital ocean to Hetzner!