Django Doctor audits code and auto fixes Django anti-patterns. We checked 666 Django projects for problems hindering maintainability and found that 48% of the Django projects could simplify their models.py
- 22% used
Charfield
with hugemax_length
when aTextField
would be better read more - 7% used deprecated
NullBooleanField
read more - 40% had string fields that allowed
null
but notblank
or vice versa read more
There were some intersections - so some projects fell into more than one camp. Note there are valid usecases for null
and blank
differing, and we go through that in depth later.
How would you simply Models? Try our Django models.py refactor challenge.
1. CharField with huge max_length
When a user needs to enter a string that may be very long then it's quick and easy to use CharField(max_length=5001)
, but that has some problems:
- 5001 is big, but is it big enough? What if the a user wants more? Yes
max_length
can be increased, but bug fixes are a pain, as is the database migration to facilitate the change. - Years from now a developer dusts off your code and reads the number 5001. Do they infer there's something special about 5001? Maybe. Devs in the future will be very busy with complicated future things so let's not add ambiguity when maintaining your "old" code.
TextField
is better here as there's really no need for a length check, so users will not be presented with a validation error. A nice rule of thumb cane be if the field does not need minimum length check then it probably does not need a maximum length check. For that reason when I see this happening I suggest using TextField
:
So why use CharField
if TextField
is so great? Historically efficient database space usage was a key consideration. But now storage is practically free. Plus, for Postgres at least, using TextField
has the same performance as CharField
, so database storage performance is not a key consideration.
There are valid cases for CharField
with a huge length though: just like an ISBN is always 10 or 13 characters, there are some very long codes. Storing QR codes? CharField
. Working with geometry and geo spacial? CharField
. Django GIS has a 2048 long VARCHAR
that Django represents as a CharField(max_length=2048)
.
2. Deprecated NullBooleanField
For four years the documentation for NullBooleanField
was two sentences and one of was "yeah…don't use it". As of 3.1 the axe has fallen and the hint of future deprecation has been replaced with an actual deprecation warning. Instead of using NullBooleanField()
use BooleanField(null=True)
.
For that reason when I see NullBooleanField
I suggest using BooleanField(null=True)
:
On the face of it, the existence of NullBooleanField
seems odd. Why have an entire class that can be achieved with a null
keyword argument? We don't see NullCharField
or NullDateField
. Indeed, for those Django expects us to do CharField(null=True)
and DateField(null=True)
. So what's was so special about NullBooleanField
and why is it now deprecated?
Enter NullBooleanField
NullBooleanField
renders a NullBooleanSelect
widget which is a <select>
containing options "Unknown" (None), "Yes" (True) and "No" (False). The implication is NullBooleanField
was intended for when explicitly stating no answer is known yet. Indeed, in many contexts it would be useful to clarify "is it False because the user has set it False, or because the user has not yet answered?". To facilitate that, the database column must allow NULL
(None
at Python level).
Unfortunately time has shown a great deal of room for confusion: StackOverflow has many questions that are answered with "use NullBooleanField
instead of BooleanField
" and vice versa. If one of the reasons for separating BooleanField
and NullBooleanField
was to give clarity then instead the opposite occurred for many.
Exit NullBooleanField
Until Django 2.1 in 2018, null was not permitted in BooleanField
because (obviously) None
is not in a bool
value. Why would we expect None
to be used in a field that says it's for boolean values only? On the other hand None
is not a str
either but CharField(null=True)
was supported and None
is not an int
, but IntegerField(null=True)
was also acceptable.
So in the deprecation of NullBooleanField
there is an argument for consistency with how the other fields handle null. If we're aiming for consistency the choice is to either add NullCharField
, NullIntegerField
, NullDateField
and so on or to rename NullBooleanField
to BooleanField
and call it a day, even though NullBooleanField
was a more accurate name.
With this deprecation three classes are impacted:
django.models.fields.NullBooleanField
django.forms.fields.BooleanField
django.forms.widgets.NullBooleanSelect
These three have slightly different handling of "empty" values, so for some the swap from NullBooleanField
to BooleanField
will need some careful testing:
from django.forms.fields import NullBooleanField
field = NullBooleanField()
assert field.clean("True") is True
assert field.clean("") is None
assert field.clean(False) is False
from django.forms.fields import BooleanField
field = BooleanField(required=False)
assert field.clean(True) is True
assert field.clean("") is False
assert field.clean(False) is False
from django.db.models import fields
field = fields.BooleanField(null=True, blank=True)
assert field.clean(True, "test") is True
assert field.clean("", "test") is None
assert field.clean(False, "test") is False
3. Null and blank out of sync
Expect the unexpected if null
and blank
are different values: null
controls if the the database level validation allows no value for the field, while blank controls if the application level validation allows no value for the field.
If blank=True
then the field model validation allows an empty value such as ""
to be inputted by users. If blank=False
then the validation will prevent empty values being inputted.
On the other hands, null
informs the database if the database column for the field can be left empty, resulting in the database setting either NULL
or NOT NULL
on the column. If the database encounters an empty NOT NULL
column then it will raise an IntegrityError
.
blank
is used during during field validation. Form
, ModelForm
, and ModelSerializer
each trigger field level validation. For a concrete example, ModelForm
calls the model instance's full_clean
method during form validation, and full_clean then calls clean_fields
, which in turn may raise a ValidationError
.
For that reason when I see this happening I suggest the following:
So normally do we want null
and blank
to the same value? When would we want to have null=False
and blank=True
or even null=True
and blank=False
?
null=False, blank=True
This facilitates using sensible default values for string fields: the field may have a default value like name = CharField(null=False, blank=True, default="")
. This is useful if the field is optional, but we also want to prevent the database column from having inconsistent data types. Sometimes being None, sometimes being ""
, and other times being a non-empty string causes extra complexity in code and in ORM: if we wanted to find all users with no name:
Foo.objects.filter(name="") | Foo.objects.filter(name__isnull=True)
Compare that with the case for when the value in the database column will always be a string:
Foo.objects.filter(name="")
null=True, blank=False
This scenario is more to keep the database happy. If using the django.db.backends.oracle
database engine then this may be needed because Oracle forces empty strings to NULL
, even if an empty string was submitted in the form, so name = CharField(null=True, blank=False)
would be needed.
Zero downtime deployment strategies may required NULL
on the database column, even though business requirements dictate the user must enter a value in the form. During blue/green deployments both the new codebase and the old codebase run against the same database at the same. If the new codebase adds a new fields and there is no sensible default value for it then null=True
is needed to avoid the database throwing an IntegrityError
while the instance of your website running the old codebase interacts with the database.
While the database column can accept null
, form validation can prevent the end users inputting no value, so data type consistency is assured? No - this required the form validation to actually run. If a developer is creating or updating via the shell then the validation will not run unless the developer calls instance.full_clean()
or instance.clean_fields()
. This strategy is simplified if a sane default value can be used instead of setting null=True
.
Could your models.py be simplified?
I can check that for you at django.doctor, or can review your GitHub PRs:
Or try out Django refactor challenges.
Top comments (1)
Great. Liked it.