TL;DR; This article shows which strategy I implemented to allow an application to be ready to use in a few minutes rather than many hours.
In this article, I will talk about the strategy I used in the project Vilicus to have big databases synced in new setups. For those who don't know Vilicus yet, I recommend reading my article about it.
Why the application takes too much time to start?
At this moment the project Vilicus uses Anchore, Clair, and Trivy as vendors to run security scans in container images. Each vendor has its own programming language, database, internal dependencies and can use different vulnerabilities databases.
Vilicus itself starts in milliseconds, but to be ready to use it's necessary to wait for the vendors to sync the vulnerabilities database with the latest changes. But these syncs can consume a lot of time.
See for example Anchore, the one with more time-consuming to complete the sync:
There is no exact time frame for the initial sync to complete as it depends heavily on environmental factors, such as the host's memory/cpu allocation, disk space, and network bandwidth. Generally, the initial sync should complete within 8 hours but may take longer. Subsequent feed updates are much faster as only deltas are updated.
https://docs.anchore.com/current/docs/faq/
Clair takes more or less 20 minutes. And Trivy is ready in a few seconds.
If you run everything from scratch will take almost 1 day to sync all vulnerabilities databases, but after this major sync, the next syncs will be faster.
This will be a problem if you would like to run an ephemeral instance in your CI / CD, so waiting hours for the sync to be completed before you can run the first scan will be inviable. Thinking about how to fix this problem, I came with a solution: Save updated database snapshots in container images every day.
Now you must be thinking, this is not a good practice, and normally I would agree. But I believe there are exceptions in specific cases, such as fixing the problem is more important than conventions.
Saving the database in a container image
I'll show you in detail how I made Anchore work, but Clair and Trivy are not much different
Anchore
First I have a compacted dump SQL, with the database already synced with less last 6 months, stored in a container image: vilicus/anchoredb:dumpsql. So we don't need to wait many hours, we just update the delta.
I used this image as a base to create a local image(vilicus/anchoredb:files) with a script to restore the database when this image runs as a container.
FROM vilicus/anchoredb:dumpsql as dumpsql
FROM postgres:9.6.21-alpine
LABEL vilicus.app.version=9.6.21-alpine
COPY --chown=postgres:postgres --from=dumpsql /opt/vilicus/data/anchore_db.tar.gz /opt/vilicus/data/anchore_db.tar.gz
COPY deployments/dockerfiles/anchore/db/files/restore-dbs.sh /docker-entrypoint-initdb.d/01.restore-dbs.sh
docker build -f deployments/dockerfiles/anchore/db/files/Dockerfile -t vilicus/anchoredb:files .
The image vilicus/anchoredb:files
is referenced in deployments/docker-compose.updater.yml
Here we start the anchore and the anchoredb.
docker-compose -f deployments/docker-compose.updater.yml up \
--build -d --force \
--remove-orphans \
--renew-anon-volumes anchore
After that, we run this command to restore the database.
docker exec anchoredb sh -c 'docker-entrypoint.sh postgres' &
So we wait for the restore and the database we ready to be connected.
docker run --network container:anchore vilicus/vilicus:latest \
sh -c "dockerize -wait http://anchore:8228/health -wait-retry-interval 10s -timeout 1000s echo done"
With the Anchore Engine and the DB ready, we start the sync.
docker exec anchore sh -c 'anchore-cli system wait'
When the sync finishes we stop anchore and we kill gracefully the Postgres PID in anchoredb.
docker stop anchore
docker exec -u postgres anchoredb sh -c 'pg_ctl stop -m smart'
We commit the container, with the changes made by the sync, into a new container image vilicus/anchoredb:local-update
CID=$(docker inspect --format="{{.Id}}" anchoredb)
docker commit $CID vilicus/anchoredb:local-update
So we finally build the container image that goes to docker hub, by copying the Postgres data from the image vilicus/anchoredb:local-update
FROM as db
FROM postgres:9.6.21-alpine
COPY --chown=postgres:postgres --from=db /data/ /data
docker build -f deployments/dockerfiles/anchore/db/Dockerfile -t vilicus/anchoredb:latest .
Check the complete script here
Clair and Trivy
For Clair check here.
For Trivy check here.
Updating the images every day
To have the databases with the latest changes, I have a GitHub workflow that runs a job everyday building the images and pushing them to the Docker Hub.
Check the workflow
That's it!
In case you have any questions, please leave a comment here or ping me on 🔗 LinkedIn.
Top comments (0)