To establish a CI/CD pipeline, setting up a new database and executing the DDL (Data Definition Language) scripts to create the schema and the DML (Data Manipulation Language) scripts to populate data can be time-consuming. Creating an image that includes all the necessary schema and data for the process to run smoothly is advisable to streamline this.
Here is an example where I install the well-known Sakila database:
# Start YugabyteDB
yugabyted start
# Create "sakila" database once ready
until ysqlsh -h $(hostname) -c "create database sakila" ; do sleep 1 ; done | uniq
# get the DDL and DML scripts from jOOQ repository, and run them
curl -Ls https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-schema.sql |
ysqlsh -eh $(hostname)
curl -Ls https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-insert-data.sql |
ysqlsh -eh $(hostname)
# stop YugabyteDB
yugabyted stop
You can use a Dockerfile to create an image that starts quickly with the database schema and data pre-installed.
The inserted data in the LSM Tree is stored only in the write-ahead logs (WALs) while the size remains small as the MemTables were not flushed to SST files.
# du -hs /root/var/data/yb-data/*/{data,wals} | sort -h
3.1M /root/var/data/yb-data/master/wals
4.0M /root/var/data/yb-data/tserver/data
13M /root/var/data/yb-data/master/data
70M /root/var/data/yb-data/tserver/wals
If you build a docker image using this, the resulting image will be too large:
# docker image ls yb-sakila
REPOSITORY TAG IMAGE ID CREATED SIZE
yb-sakila latest 819b80485518 About a minute ago 3.87GB
The reason is that there are sparse files that do not use space for the unallocated parts, but Docker stores the whole file. With --apparent-size
, you can check the size:
du -hs --apparent-size /root/var/data/yb-data/*/{data,wals} | sort -h
1.8M /root/var/data/yb-data/tserver/data
13M /root/var/data/yb-data/master/data
26M /root/var/data/yb-data/master/wals
1.7G /root/var/data/yb-data/tserver/wals
this indicates that every tablet possesses an index.000000000
file of approximately 23 megabytes in size:
The Sakila schema consists of sixty tables and indexes, which consume over one gigabyte of space. If we extrapolate the size of an image for a schema with a thousand tables, it would be enormous.
However, there is some good news! The index file in question is not actually necessary. YugabyteDB's Raft replication code has been taken from Apache Kudo, and this file is simply an index for the WAL cached in memory. It is used when a follower disconnected from a leader comes back and retrieves a range of write operations to resolve the gap. The index does not need to be persisted as it is re-created when starting. This is described in Apache Kudo's LogIndex and is implemented as a memory-mapped file that is never synced to disk.
Therefore, we can safely drop it when stopping YugabyteDB.
yugabyted stop
rm -f /root/var/data/yb-data/*/wals/table-*/tablet-*/index.*
In a Dockerfile, all actions must be performed in the same layer so that the allocated space is reclaimed upon file removal. Here is an example:
FROM yugabytedb/yugabyte:latest
# get Sakila DDL and DML scripts
ADD https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-schema.sql .
ADD https://github.com/jOOQ/sakila/raw/main/yugabytedb-sakila-db/yugabytedb-sakila-insert-data.sql .
# Start YugabyteDB to run the scripts
RUN yugabyted start \
&& until ysqlsh -h $(hostname) -c "create database sakila" ; do sleep 1 ; done | uniq \
&& ysqlsh -h $(hostname) -d sakila -f yugabytedb-sakila-schema.sql \
&& ysqlsh -h $(hostname) -d sakila -f yugabytedb-sakila-insert-data.sql \
&& yugabyted stop \
&& rm -f /root/var/data/yb-data/*/wals/table-*/tablet-*/index.*
# starting a container can re-start YugabyteDB
ENTRYPOINT yugabyted start --background=false
I can create an image and verify its size:
docker build -t yb-sakila .
docker image ls yb-sakila
The image is now back to its expected size, the base image with an additional 150MB:
# docker image ls yb-sakila
REPOSITORY TAG IMAGE ID CREATED SIZE
yb-sakila latest b462eb5aaacb 17 seconds ago 2.19GB
This image can be used easily, for example:
docker rm -f $(docker ps -qa)
docker run -d -p5433:5433 --name yb-sakila yb-sakila
docker exec yb-sakila bash -c 'until postgres/bin/pg_isready -h $(hostname) ; do sleep 1 ; done | uniq'
psql -h localhost -p 5433 -U yugabyte sakila <<<'\d'
Top comments (0)