48% of Django projects could simplify models.py in these 3 ways

#django #python #webdev #codequality

Django Doctor audits code and auto fixes Django anti-patterns. We checked 666 Django projects for problems hindering maintainability and found that 48% of the Django projects could simplify their models.py

22% used Charfield with huge max_length when a TextField would be better read more
7% used deprecated NullBooleanField read more
40% had string fields that allowed null but not blank or vice versa read more

There were some intersections - so some projects fell into more than one camp. Note there are valid usecases for null and blank differing, and we go through that in depth later.

How would you simply Models? Try our Django models.py refactor challenge.

1. CharField with huge max_length

When a user needs to enter a string that may be very long then it's quick and easy to use CharField(max_length=5001), but that has some problems:

5001 is big, but is it big enough? What if the a user wants more? Yes max_length can be increased, but bug fixes are a pain, as is the database migration to facilitate the change.
Years from now a developer dusts off your code and reads the number 5001. Do they infer there's something special about 5001? Maybe. Devs in the future will be very busy with complicated future things so let's not add ambiguity when maintaining your "old" code.

TextField is better here as there's really no need for a length check, so users will not be presented with a validation error. A nice rule of thumb cane be if the field does not need minimum length check then it probably does not need a maximum length check. For that reason when I see this happening I suggest using TextField:

So why use CharField if TextField is so great? Historically efficient database space usage was a key consideration. But now storage is practically free. Plus, for Postgres at least, using TextField has the same performance as CharField, so database storage performance is not a key consideration.

There are valid cases for CharField with a huge length though: just like an ISBN is always 10 or 13 characters, there are some very long codes. Storing QR codes? CharField. Working with geometry and geo spacial? CharField. Django GIS has a 2048 long VARCHAR that Django represents as a CharField(max_length=2048).

2. Deprecated NullBooleanField

For four years the documentation for NullBooleanField was two sentences and one of was "yeah…don't use it". As of 3.1 the axe has fallen and the hint of future deprecation has been replaced with an actual deprecation warning. Instead of using NullBooleanField() use BooleanField(null=True).

For that reason when I see NullBooleanField I suggest using BooleanField(null=True):

On the face of it, the existence of NullBooleanField seems odd. Why have an entire class that can be achieved with a null keyword argument? We don't see NullCharField or NullDateField. Indeed, for those Django expects us to do CharField(null=True) and DateField(null=True). So what's was so special about NullBooleanField and why is it now deprecated?

Enter NullBooleanField

NullBooleanField renders a NullBooleanSelect widget which is a <select> containing options "Unknown" (None), "Yes" (True) and "No" (False). The implication is NullBooleanField was intended for when explicitly stating no answer is known yet. Indeed, in many contexts it would be useful to clarify "is it False because the user has set it False, or because the user has not yet answered?". To facilitate that, the database column must allow NULL (None at Python level).

Unfortunately time has shown a great deal of room for confusion: StackOverflow has many questions that are answered with "use NullBooleanField instead of BooleanField" and vice versa. If one of the reasons for separating BooleanField and NullBooleanField was to give clarity then instead the opposite occurred for many.

Exit NullBooleanField

Until Django 2.1 in 2018, null was not permitted in BooleanField because (obviously) None is not in a bool value. Why would we expect None to be used in a field that says it's for boolean values only? On the other hand None is not a str either but CharField(null=True) was supported and None is not an int, but IntegerField(null=True) was also acceptable.

So in the deprecation of NullBooleanField there is an argument for consistency with how the other fields handle null. If we're aiming for consistency the choice is to either add NullCharField, NullIntegerField, NullDateField and so on or to rename NullBooleanField to BooleanField and call it a day, even though NullBooleanField was a more accurate name.

With this deprecation three classes are impacted:

django.models.fields.NullBooleanField
django.forms.fields.BooleanField
django.forms.widgets.NullBooleanSelect

These three have slightly different handling of "empty" values, so for some the swap from NullBooleanField to BooleanField will need some careful testing:

from django.forms.fields import NullBooleanField

field = NullBooleanField()

assert field.clean("True") is True
assert field.clean("") is None
assert field.clean(False) is False

from django.forms.fields import BooleanField

field = BooleanField(required=False)

assert field.clean(True) is True
assert field.clean("") is False
assert field.clean(False) is False

from django.db.models import fields

field = fields.BooleanField(null=True, blank=True)

assert field.clean(True, "test") is True
assert field.clean("", "test") is None
assert field.clean(False, "test") is False

3. Null and blank out of sync

Expect the unexpected if null and blank are different values: null controls if the the database level validation allows no value for the field, while blank controls if the application level validation allows no value for the field.

If blank=True then the field model validation allows an empty value such as "" to be inputted by users. If blank=False then the validation will prevent empty values being inputted.

On the other hands, null informs the database if the database column for the field can be left empty, resulting in the database setting either NULL or NOT NULL on the column. If the database encounters an empty NOT NULL column then it will raise an IntegrityError.

blank is used during during field validation. Form, ModelForm, and ModelSerializer each trigger field level validation. For a concrete example, ModelForm calls the model instance's full_clean method during form validation, and full_clean then calls clean_fields, which in turn may raise a ValidationError.

For that reason when I see this happening I suggest the following:

So normally do we want null and blank to the same value? When would we want to have null=False and blank=True or even null=True and blank=False?

null=False, blank=True

This facilitates using sensible default values for string fields: the field may have a default value like name = CharField(null=False, blank=True, default=""). This is useful if the field is optional, but we also want to prevent the database column from having inconsistent data types. Sometimes being None, sometimes being "", and other times being a non-empty string causes extra complexity in code and in ORM: if we wanted to find all users with no name:

Foo.objects.filter(name="") | Foo.objects.filter(name__isnull=True)

Compare that with the case for when the value in the database column will always be a string:

Foo.objects.filter(name="")

null=True, blank=False

This scenario is more to keep the database happy. If using the django.db.backends.oracle database engine then this may be needed because Oracle forces empty strings to NULL, even if an empty string was submitted in the form, so name = CharField(null=True, blank=False) would be needed.

Zero downtime deployment strategies may required NULL on the database column, even though business requirements dictate the user must enter a value in the form. During blue/green deployments both the new codebase and the old codebase run against the same database at the same. If the new codebase adds a new fields and there is no sensible default value for it then null=True is needed to avoid the database throwing an IntegrityError while the instance of your website running the old codebase interacts with the database.

While the database column can accept null, form validation can prevent the end users inputting no value, so data type consistency is assured? No - this required the form validation to actually run. If a developer is creating or updating via the shell then the validation will not run unless the developer calls instance.full_clean() or instance.clean_fields(). This strategy is simplified if a sane default value can be used instead of setting null=True.