Alembic: setting a column to nullable=False if it already has data in it

#python #sql #database

Alembic is an excellent way to auto-manage your SQL database through the python code in your git repository. Instead of having to manually make the right changes by hand in the database every time you upgrade your code, you can keep the database upgrades as part of your codebase too: Alembic can automatically generate these upgrade scripts by comparing your database's current structure with what they should be according to your ORM schema (SQLAlchemy).

This is a pleasure to use, as it means you don't have to worry about your database not matching your code when you make a deployment. The database update can now be made as one step in your code deployment.

However, one area where Alembic's autogenerator doesn't quite get right is how to handle setting a field from nullable=True to nullable=False when the column already has null values in it. The autogenerator will dutifully create a script that updates this column property that contains this:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.alter_column('table_name', 'column_name',
               existing_type=sa.VARCHAR(),
               nullable=False,
               schema='schema_name')
    # ### end Alembic commands ###

But, since it can't magic away all the existing nulls in the database, the database will reject this change as it would leave nulls inside a column that isn't nullable. The error message is basically "column contains null values":

sqlalchemy.exc.ProgrammingError: (pg8000.exceptions.ProgrammingError) {'S': 'ERROR', 'V': 'ERROR', 'C': '23502', 'M': 'column "column_name" contains nu
ll values', 's': 'acquire', 't': 'table_name', 'c': 'column_name', 'F': 'tablecmds.c', 'L': '4858', 'R': 'ATRewriteTable'}
[SQL: ALTER TABLE schema_name.table_name ALTER COLUMN column_name SET NOT NULL]

To resolve this, it's up to you to first get rid of all the nulls, which if you don't want to do it manually in the database, can also be done by editing the autogenerated script to handle this. The easiest way is to add an op.execute() that will update the null values to something else. Something like this:

def upgrade():
    table_name = sa.sql.table("table_name", sa.Column("column_name", sa.VARCHAR()), schema="schema_name")
    op.execute(
        table_name.update()
        .where(table_name.c.column_name is None)
        .values(column_name="some_value")
    )
    op.alter_column('table_name', 'column_name',
               existing_type=sa.VARCHAR(),
               nullable=False,
               schema='schema_name')

Here, the op.execute is setting the hypothetical column called column_name to have a value of "some_value" if it is currently null. Thus ensuring no values in the column are still null by the time the op.alter_column sets the nullable constraint to False.

In all the above code, this was on a postgres database with a non-default schema called schema_name. To use default schema or databases without schema, it should be sufficient to omit schema= everywhere it appears.

Top comments (1)

Sobit Akhmedov • Jul 7 '22

If you followed the article and your WHERE clause is messed up, here's why:

# wrong
.where(table_name.c.column_name is None)

# correct
.where(table_name.c.column_name == None)
.where(table_name.c.column_name.is_(None))

DEV Community

Alembic: setting a column to nullable=False if it already has data in it

Top comments (1)

Read next

Agents: explore the powerful emerging development of LLM as reasoning agents.

Beautiful Soup or Scrapy or Selenium - Best tool for Python Web scraping?

How to use Midjourney API

Four Types of Bar Charts in Python - Based on Array Data