I am a senior engineer on Red Hat's Cost Management product. The product uses django and django-tenant-schemas to manage customer data separation. The creation of these schemata leverage the django migrations processing. Since these migrations grow in number and complexity over time, the tenant schema creation process takes longer and is more processing intensive at the database.
A new strategy was created to improve the schema creation time as well as ensure that the schemata have all migrations applied and tracked. A template schema was created that is separate from any customer schema but tracked and kept up to date with migrations just as the other customer schemata are. A database function was created that will create a copy of the template schema as the new customer schema.
The main processing is all done at the database and in one transaction. Since the current state of the template schema is copied, then the new schema will be up-to-date per migrations to the template. Data are copied from the migration tracking table from the template to the copy so that the copy looks in-sync to the migrations manager.
The database that the product uses has objects that are managed by the ORM as well as other objects that must be created via SQL (such as materialized views). Using the template schema as the source of truth ensures that any objects whether or not they are managed by the ORM are copied. The current implementation of the function (called
clone_schema) copies the following objects:
- Tables (including indexes, primary key, unique constraints)
- Partitioned Tables and their partitions
- Foreign Keys
- Materialized Views (including any indexes)
- Functions and Procedures
- Table Triggers
This concept was implemented for PostgreSQL, but could be adapted for most modern relational databases that use schemata (or namespaces). This concept should be adaptable for other migration managers as well. Our project is using django and django-tenant-schemas, but the implementation to bypass migrations was fairly simple. The technique was to use the existing tenant create if the schema was one of
<TEMPLATE_SCHEMA_NAME>, otherwise, validate and use the
clone_schema function. The django tenant module uses the Tenant model's
save() method to control tenant creation. I simply override that
create_schema method from the model mixin to execute the changes.
My implementation uses some self-healing techniques. So, if the template schema is missing, that schema will be created using the standard application of the migration files. If the copy/clone function is missing, then it is recreated. A lot of pre-checking of parameters was done to reduce, as much as possible, a database exception that would halt a transaction.
The django implementation can be found here in our Tenant model class definition.
The benefits are:
- Template schema is created using the same migrations that would have created a customer tenant schema
- The template schema is kept up-to-date with the migrations
- The customer schemata are created from the current state of the template
- The clone_schema function's implementation at the database will help keep the object creation fast and at a fairly consistent speed until enough new objects or data initialization begins to impact the creation.
Some may consider these drawbacks:
- Need to know the database engine's procedural language SQL
- Need to know SQL in general
- Need to know a bit about the internals of your database catalog
- Reliance on a database function instead of application code
In the case of our project, the benefits definitely outweighed the drawbacks. And the more in-depth knowledge will help us better leverage our database resources.
Testing the performance was critical. The base platform for the functional testing was a Core i7 laptop with 32G of RAM running Fedora 32, Python 3.8, django 3.x, and PostgreSQL 12 via docker.
Timings based on the tenant creation using the migrations took ~10 seconds to apply the migrations for our application customer tenant schema.
Timings using the
clone_shema function once the template schema was created had customer schema being created taking ~1.8 seconds. Round that up to 2 seconds for ease of maths and that's a 500% increase in speed.
I encourage all application architects and developers using a database-backed tenant webapp to investigate this method for its efficacy in your project. It could give you a significant performance boost since it would bypass the lengthy task of applying all migrations in order to the new schema and the result would be the same.
The clone function resides here in our repository.