One of the quickest wins for improving the speed of a Django app is to fix the "oops didn't mean to" inefficient database reads that slipped into a codebase unnoticed over the years. Indeed, as the codebase matures and the team members change, and code gather dust old inefficient ORM queries can be overlooked. Simple ORM queries like:
def check_hounds():
queryset = HoundsModel.objects.all()
if len(queryset) > 2:
return "oh no. Run!"
This is reading every record from the database and checking at application level. Much less efficient than
def check_hounds():
if HoundsModel.objects.count() > 2:
return "oh no. Run!"
It's easy to scoff at such inefficiencies. "I would never do that". Well, maybe not. But what about some dev in the past that was rushing to meet a deadline, an 11pm code reviewer equally surviving on coffee during crunch time? And you're now responsible for that code!
Slightly less textbook, but also just as easy to both overlook and improve:
def write_condolence_letters(visited_coops):
queryset = ChickenModel.objects.all()
for chicken in queryset:
if chicken.coop.pk in visited_coops:
return f"dear {chicken.coop.owner_name}..."
else:
...
Spot the problems? Well, looping over all()
will read everything from the db in one go. If there are billions of chickens then expect performance issues. For that we can use iterator()
to chunk the reads 2000 (default) records pulled from the db at a time. Additionally, chicken.coop.pk
does an additional database read for each chicken because relationships are lazily evaluated by default: coop
is read from the db when only it was accessed via chicken.coop
. For this particular field we can use Django's field caching: Django creates an _id
field for related field. So this can be:
def write_condolence_letters(visited_coops):
queryset = ChickenModel.objects.all()
for chicken in queryset.iterator():
if chicken.coop_pk in visited_coops:
return f"dear {chicken.coop.owner_name}..."
else:
...
What if we're working with a field on a related model other than the pk? Sure coop_pk
is created by Django for us, but what if we needed to access other fields such as chicken.coop.owner_name
? For that we an use select_related
or prefetch_related
(depending on if it's a ForeignKey or OneToOne etc relationship):
def write_condolence_letters(visited_coops):
queryset = ChickenModel.objects.all()
for chicken in queryset.select_related('coop'):
if chicken.coop.pk in visited_coops:
return f"dear {chicken.coop.owner_name}..."
else:
...
Now the related coop
will be pulled out at during the same read as chicken
. Notice though iterator
is no longer used because iterator is useless with select_related
and prefetch_related
: iterator
nullifies their effects. So these are mutually exclusive so only with context can we really say specifically what performance improvement strategy is needed.
Does your codebase have old inefficient ORM lookups?
It's easier to spot new inefficient ORM lookups during code review but what about those already in your codebase? I can check that for you at django.doctor, or can review your GitHub PRs:
Or try out Django refactor challenges.
Top comments (1)
I think you are great! i just want to discuss tech with Python developer.
I built a display machine state using Python3 with Flask!
Flask State Github๏ผgithub.com/yoobool/flask-state
Should i can get some improvement suggestions from you? Thanks~
Some comments have been hidden by the post's author - find out more