Inspiration
Some months back i happen to be involved in a multi-tanent based Django project (SaaS), i began my research on multi-tanency and multi-tanancy in django. After researching for some time, i came across the django_tenant library. I studied its documentation and i was able to integrate it into the project i was working on and was able to solve my problem.
After using the library, i became curious about how the library is able to solve the problem internally, So i went further to study extensively how the library is implemented. This is what brought me to writing this article, as i feel it will be good to share how this library is able to solve the multi-tenancy problem in Django.
Introduction
First of all what is multi-tenancy?
Multi-tenancy is a software architecture in which a single instance of a software runs on a server and serves multiple tenants. These systems are designed in a shared manner rather than dedicated or isolated. A tenant is a group of users who share a common access with specific privileges in a multi-tenant application.
When designing multi-tenant software, there are typically three different types of architectures.
Isolated Approach: In this approach, each tenant has it’s own dedicated database that stores its data.
Semi Isolated Approach: Here the entire software uses single database, but each tenant has its own schema in which the tenant's data is stored.
Shared Approach: In this approach, all tenants share the same database and schema. mostly, there is a main tenant table, which all other tables are relating to.
This article focused on the second approach (Semi Isolated Approach) with PostgreSQL database, which seems to be the ideal compromise between simplicity and performance.
Before we dive into this article, let talk about schema. What is a schema in a database?
Relational databases have different concepts of schema, but in terms of PostgreSQL database, we can think of schema as a directory in an operating system, where each directory (schema) has it’s own set of files (tables and objects). This allows the same table name and objects to be used in different schemas without conflict.
Enough of definitions! Now back to Django.
In Django, it can be quite challenging to come up with a multi tenant solution as Django currently provides no simple way to support multiple tenants applications in a single project instance. This article is here to address this issue. by extensively explaining how we can tackle multi-tenancy problem withing single Django project instance.
How it Works
Tenants are identified via their host name (i.e tenant.domain.com). This information is stored on a table on the public schema. Whenever a request is made, the host name is used to match a tenant in the database. If there’s a match, the search path is updated to use this tenant’s schema. So from now on all queries will take place at the tenant’s schema. For example, suppose you have a tenant customer at http://customer.example.com. Any request coming from customer.example.com will automatically use customer’s schema and make the tenant available at the request. If no tenant is found, a 404 error is raised. This also means you should have a tenant for your main domain, typically using the public schema.
In order to achieve this, a lot of customization need to be done which are as follows:
- Modify the Django PostgreSQL database wrapper to enable setting of db search path to the requested tenant's schema.
- Implement a custom migration executor for multi schema db migrations.
- Implement tenant model to handle tenant and schema creation.
- Implement a custom middleware to handle request and tenant mapping.
Throughout this article, the public schema name is assumed to be 'public'
Modifying the Django PostgreSQL database wrapper
First we're going to begin by modifying the default Django postgreSQl database introspection to enable schema based db searching.
from django.db.backends.postgresql.introspection import DatabaseIntrospection
class DatabaseSchemaIntrospectionSearchPathContext:
def __init__(self, cursor, connection):
self.cursor = cursor
self.connection = connection
self.original_search_path = None
def __enter__(self):
self.cursor.execute('SHOW search_path')
self.original_search_path = self.cursor.fetchone()[0].split(',')
self.cursor.execute(f"SET search_path = '{self.connection.schema_name}'")
def __exit__(self, *args, **kwargs):
formatted_search_paths = ', '.join(
f"'{search_path.strip()}'"
for search_path in self.original_search_path
)
self.cursor.execute(f'SET search_path = {formatted_search_paths}')
class DatabaseSchemaIntrospection(DatabaseIntrospection):
def get_table_list(self, cursor):
with DatabaseSchemaIntrospectionSearchPathContext(cursor=cursor, connection=self.connection):
return super().get_table_list(cursor)
def get_table_description(self, cursor, table_name):
with DatabaseSchemaIntrospectionSearchPathContext(cursor=cursor, connection=self.connection):
return super().get_table_description(cursor, table_name)
def get_sequences(self, cursor, table_name, table_fields=()):
with DatabaseSchemaIntrospectionSearchPathContext(cursor=cursor, connection=self.connection):
return super().get_sequences(cursor, table_name, table_fields)
def get_key_columns(self, cursor, table_name):
with DatabaseSchemaIntrospectionSearchPathContext(cursor=cursor, connection=self.connection):
return super().get_key_columns(cursor, table_name)
def get_constraints(self, cursor, table_name):
with DatabaseSchemaIntrospectionSearchPathContext(cursor=cursor, connection=self.connection):
return super().get_constraints(cursor, table_name)
The DatabaseSchemaIntrospectionSearchPathContext
is a context manager that set the db search path to the requested tenant's schema, and restores the original search path of the cursor after the method called in the introspection class has been executed. We then defined DatabaseSchemaIntrospection
class which wraps some of the Django's PostgreSQL introspection (DatabaseIntrospection
) methods implementation with the DatabaseSchemaIntrospectionSearchPathContext
.
Next we're going to define a class which will extend the Django's default PostgreSQL database wrapper and modify it to make use of our DatabaseSchemaIntrospection
class.
import re
import warnings
from django.conf import settings
from importlib import import_module
import DatabaseSchemaIntrospection
from django.contrib.contenttypes.models import ContentType
from django.core.exceptions import ValidationError
import django.db.utils
import psycopg2
ORIGINAL_BACKEND = getattr(settings, 'ORIGINAL_BACKEND', 'django.db.backends.postgresql')
original_backend = import_module(ORIGINAL_BACKEND + '.base')
class DatabaseWrapper(original_backend.DatabaseWrapper):
def __init__(self, *args, **kwargs):
self.search_path_set_schemas = None
self.tenant = None
self.schema_name = None
super().__init__(*args, **kwargs)
self.introspection = DatabaseSchemaIntrospection(self)
self.set_schema_to_public()
def close(self):
self.search_path_set_schemas = None
super().close()
def set_tenant(self, tenant, include_public=True):
self.tenant = tenant
self.schema_name = tenant.schema_name
self.include_public_schema = include_public
self.set_settings_schema(self.schema_name)
self.search_path_set_schemas = None
ContentType.objects.clear_cache()
def set_schema(self, schema_name, include_public=True):
self.set_tenant(FakeTenant(schema_name=schema_name), include_public)
def set_schema_to_public(self):
self.set_tenant(FakeTenant(schema_name='public'))
def set_settings_schema(self, schema_name):
self.settings_dict['SCHEMA'] = schema_name
def _cursor(self, name=None):
if name:
# Only supported and required by Django 1.11 (server-side cursor)
cursor = super()._cursor(name=name)
else:
cursor = super()._cursor()
return cursor
class FakeTenant:
"""
You can't import any db model in a backend (apparently?), so this class is used
for wrapping schema names in a tenant-like structure.
"""
def __init__(self, schema_name):
self.schema_name = schema_name
In the DatabaseWrapper
class above, we introduced some tenants specific variables to the default Django's DatabaseWrapper
instance variables, and modified the wrapper's introspection to point to our custom introspection (DatabaseSchemaIntrospection
) class instead of Django's default (DatabaseIntrospection
), we then add custom methods which handles setting of tenants to the active db connection via Djangos' connection object (django.db.connection
).
Implementing a multi-schema migration executor
Here we're going to start by creating a migration runner function which we're going to be using for running schema-aware migrations.
import sys
from django.db import transaction
from django.db.migrations.recorder import MigrationRecorder
from django.db import connections, DEFAULT_DB_ALIAS
from django.core.management.commands.migrate import Command
from django.core.management.base import OutputWrapper
from django.dispatch import Signal
def run_migrations(args, options, schema_name, allow_atomic=True, idx=None, count=None):
connection = connections[options.get('database', DEFAULT_DB_ALIAS]
connection.set_schema(schema_name)
migration_recorder = MigrationRecorder(connection)
migration_recorder.ensure_schema()
stdout = OutputWrapper(sys.stdout)
stderr = OutputWrapper(sys.stderr)
if int(options.get('verbosity', 1)) >= 1:
stdout.write(style.NOTICE("=== Starting migration"))
Command(stdout=stdout, stderr=stderr).execute(*args, **options)
try:
transaction.commit()
connection.close()
connection.connection = None
except transaction.TransactionManagementError:
if not allow_atomic:
raise
pass
connection.set_schema_to_public()
Signal().send(run_migrations, schema_name=schema_name)
Above we defined run_migrations
function which execute schema aware db migration. Notice how we're using Django's connection object to set_schema to the active db connection. This is archived as a result of the customization we did to the Django's PostgresQL DatabaseWrapper
.
Next we're going to define our own migration executor class which will be responsible for running the db migration using the run_migrations
function defined above.
class MigrationExecutor:
def __init__(self, args, options):
self.args = args
self.options = options
self.PUBLIC_SCHEMA_NAME = 'public'
def migrate(self, tenants=None):
tenants = tenants or []
if self.PUBLIC_SCHEMA_NAME in tenants:
run_migrations(self.args, self.options, self.PUBLIC_SCHEMA_NAME)
tenants.pop(tenants.index(self.PUBLIC_SCHEMA_NAME))
for idx, schema_name in enumerate(tenants):
run_migrations(self.args, self.options, schema_name, idx=idx, count=len(tenants))
The MigrationExecutor
class above is going to be called whenever the migrate
command is called replacing the default Django MigrationExecutor
class.
Next is to create a custom schema-aware migrate command class to override the default Django migrate command class which is not schema-aware.
class MigrateSchemasCommand:
def add_arguments(self, parser):
parser.add_argument('app_label', nargs='?', help='App label of an application to synchronize the state.')
def handle(self, *args, **options):
self.PUBLIC_SCHEMA_NAME = 'public'
executor = MigrationExecutor(self.args, self.options)
# run migration on public schema
executor.migrate(tenants=[self.PUBLIC_SCHEMA_NAME])
# run migration on non public schemas
tenants = Tenant.objects.only('schema_name').exclude(schema_name=self.PUBLIC_SCHEMA_NAME).values_list('schema_name', flat=True)
executor.migrate(tenants=tenants)
Command = MigrateSchemasCommand
This is a basic migration command that support only one option (app_label). In this command we make the assumption that the tenant model name is Tenant
. We will talk about the tenant model in the next section of this article.
Implementing tenant model for tenant and schema creation
In this section we're going to implement a mixin class which will be the parent of the main tenant class. The mixin class is going to handle the schema creation whenever a new tenant is created.
from django.dispatch import Signal
from django.core.management import call_command
from django.db import models, connections, DEFAULT_DB_ALIAS
class TenantMixin(models.Model):
auto_drop_schema = False
auto_create_schema = True
schema_name = models.CharField(max_length=50, unique=True, db_index=True)
domain = models.CharField(max_length=250, unique=True, db_index=True)
class Meta:
abstract = True
def __str__(self):
return self.schema_name
def save(self, verbosity=1, *args, **kwargs):
connection = connections[DEFAULT_DB_ALIAS]
is_new = self._state.adding
has_schema = hasattr(connection, 'schema_name')
if has_schema and is_new and connection.schema_name != 'public':
raise Exception("Can't create tenant outside the public schema. Current schema is %s." % connection.schema_name)
elif has_schema and not is_new and connection.schema_name not in (self.schema_name, get_public_schema_name()):
raise Exception("Can't update tenant outside it's own schema or the public schema. Current schema is %s."
% connection.schema_name)
super().save(*args, **kwargs)
if has_schema and is_new and self.auto_create_schema:
try:
self.create_schema(verbosity=verbosity)
except Exception:
# Tenant creation failed, delete what we created and
# re-raise the exception
self.delete(force_drop=True)
raise
def _drop_schema(self, force_drop=False):
""" Drops the schema"""
connection = connections[get_tenant_database_alias()]
has_schema = hasattr(connection, 'schema_name')
if has_schema and connection.schema_name not in (self.schema_name, get_public_schema_name()):
raise Exception("Can't delete tenant outside it's own schema or "
"the public schema. Current schema is %s."
% connection.schema_name)
if has_schema and schema_exists(self.schema_name) and (self.auto_drop_schema or force_drop):
cursor = connection.cursor()
cursor.execute('DROP SCHEMA "%s" CASCADE' % self.schema_name)
def delete(self, force_drop=False, *args, **kwargs):
"""
Deleting a tenant will drop the tenant's schema if the attribute
auto_drop_schema is set to True.
"""
self._drop_schema(force_drop)
super().delete(*args, **kwargs)
def create_schema(self, verbosity=1):
"""
Creates the schema for this tenant.
"""
connection = connections[DEFAULT_DB_ALIAS]
cursor = connection.cursor()
# Check if schema exist before creating one.
# Do nothing if schema already exist.
if schema_exists(self.schema_name):
return False
# create the schema
cursor.execute('CREATE SCHEMA "%s"' % self.schema_name)
call_command('migrate_schemas',
tenant=True,
schema_name=self.schema_name,
interactive=False,
verbosity=verbosity)
connection.set_schema_to_public()
We're using the schema_exists
function check if a schema exists in the db. We did not implement this function yet, it quite easy to implement all we have to do is to execute an SQL command. bellow is the schema_exists
function.
from django.db import connections, DEFAULT_DB_ALIAS
def schema_exists(schema_name, database=DEFAULT_DB_ALIAS):
_connection = connections[database]
cursor = _connection.cursor()
exists = False
# check if this schema exists in the db
sql = 'SELECT EXISTS(SELECT 1 FROM pg_catalog.pg_namespace WHERE LOWER(nspname) = LOWER(%s))'
cursor.execute(sql, (schema_name,))
row = cursor.fetchone()
if row:
exists = row[0]
cursor.close()
return exists
With the TenantMixin
class we can have a tenant model that will create a schema for its self in the database whenever an object of the model is created. We can define the tenant model however we want depending on the use case. Here we're going showcase a simple example of the tenant model.
from django.db import models
class Tenant(TenantMixin):
name = models.CharField(max_length=100)
on_trial = models.BooleanField()
created_on = models.DateField(auto_now_add=True)
Implementing custom middleware to handle request and tenant mapping
This is the final stage of our Django multi-tanent solution in which we will implement a middleware that will be responsible for mapping requested hostname to a tenant if the tenant exist or raise tenant not found exception the tenant does not exist as explained in How it Works section.
from django.db import connection
from django.http import Http404
from django.urls import set_urlconf
from django.utils.deprecation import MiddlewareMixin
class TenantMainMiddleware(MiddlewareMixin):
TENANT_NOT_FOUND_EXCEPTION = Http404
def hostname_from_request(request):
""" Extracts hostname from request. Used for custom requests filtering.
By default removes the request's port and common prefixes.
"""
if request.get_host().split(':')[0].startswith("www."):
return request.get_host().split(':')[0][4:]
return request.get_host().split(':')[0]
def get_tenant(self, tenant_model, hostname):
return tenant_model.objects.get(domain=hostname)
def process_request(self, request):
# Connection needs first to be at the public schema, as this is where
# the tenant metadata is stored.
connection.set_schema_to_public()
hostname = self.hostname_from_request(request)
# Assuming tanent model name is Tenant.
tenant_model = 'Tenant'
try:
tenant = self.get_tenant(tenant_model, hostname)
except tenant_model.DoesNotExist:
self.no_tenant_found(request, hostname)
return
request.tenant = tenant
connection.set_tenant(request.tenant)
self.setup_url_routing(request)
def no_tenant_found(self, request, hostname):
"""
raise exception if no tenant is found.
"""
raise self.TENANT_NOT_FOUND_EXCEPTION('No tenant for hostname "%s"' % hostname)
def setup_url_routing(request, force_public=False):
"""
Sets the correct url conf based on the tenant
"""
public_schema_name = 'public'
if force_public or request.tenant.schema_name == 'public'):
# assuming the configured PUBLIC_SCHEMA_URLCONF is urls_public.
request.urlconf = 'urls_public'
Conclsion
Achieving multi-tanency in a single Django project instance wasn't straightforward on its own, but with django_tenant library it became very easy to achieve. This article provides a detailed explanation on how multi-tenancy can be archived in a single Django project instance using PostgreSQL database..
Check these out!
Social
I'm new to blogging, please like and share this as much as you think I deserve!
Top comments (0)