DEV Community

Cover image for 40% of Django projects have these URL bugs waiting to happen
Code Review Doctor
Code Review Doctor

Posted on • Updated on

40% of Django projects have these URL bugs waiting to happen

Django Doctor audits code and auto fixes Django anti-patterns. We checked 666 Django projects for problems hindering maintainability and found that 40% of the Django websites have big URL problems:

  • 8% had duplicate URL names read more
  • 30% used hard-coded URLs in templates instead of using url template tag read more
  • 12% used hard-coded static asset URLs instead of using the static template tag read more

Interestingly practically all projects that hard coded URLs also hard-coded static asset URLs. We manually checked a sample of the offending public projects and found a mix of hard-coding and not hard coding URLs in templates so this is not to say they hard-coded all URLs just some.

How would you solve maintainability problems? Try our Django refactor challenge.

1. Duplicate URL names

Can you see the problem with this urls.py?

from django.urls import path, include

import views


router = DefaultRouter()
router.register("application", views.AppView, basename="app")
router.register("licence", views.LicenceView, basename="licence")
router.register("good", views.GoodView, basename="good")
router.register("file-version", views.FileView, basename="file")


urlpatterns = router.urls

urlpatterns += [
    path("healthcheck/", include("health_check.urls")),
    path("callback/", views.CallbackView.as_view(), name="login"),
    path("applications/", include("api.applications.urls")),
    path("audit-trail/", include("api.audit_trail.urls")),
    path("cases/", include("api.cases.urls")),
    path("compliance/", include("api.compliance.urls")),
    path("goods/", include("api.goods.urls")),
    path("goods-types/", include("api.goodstype.urls")),
    path("picklist/", include("api.picklists.urls")),
    path("documents/", include("api.documents.urls")),
    path("queries/", include("api.queries.urls")),
    path("routing-rules/", include("api.workflow.urls")),
    path("licences/", include("api.licences.urls")),
    path("signup/", views.Signup.as_view(), name="login")),
    path("licences/", include("api.licences.urls")),
    path("data-workspace/", include("api.data_workspace.urls")),
] + static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)

if settings.FEATURE_API_ENABLED:
    urlpatterns.append(path("search/", include("api.search.urls")))

if settings.FEATURE_ADMIN_ENABLED:
    urlpatterns.append(path("admin/", admin.site.urls))

    if settings.FEATURE_STAFF_SSO_ENABLED:
        urlpatterns = [
            path("admin/login/", api.core.views.LoginView.as_view()),
            path("auth/", include("authbroker_client.urls")),
        ] + urlpatterns
Enter fullscreen mode Exit fullscreen mode

You saw it right? I did because I'm a code review bot and am good at these kind of things. Here's a clue: what url will reverse("login") or {% url "login" %} return? Check again. That's right - /callback/ and /login/ have the same name!

path('callback/', views.CallbackView.as_view(), name='login'),
...
path('signup/', views.Signup.as_view(), name='login'),
Enter fullscreen mode Exit fullscreen mode

How did this happen? Well it's easily missed when reading the code. Certainly possible to miss during code review as humans are fallible. W Edwards Deming pointed out that do more manual QA will not necessarily lead to improved quality: when there are multiple people involved in QA, each can rely on the other to spot a defect, so as a factory manager adds more humans manually looking for defects, the more defects would be missed because Frank assumed Tim would catch it, and Tim was relying on Bill, and Bill knows Francis was good and Francis knew Kim never drops the ball etc. Is it their fault for missing the bugs? No, management should have provided then the right tools and processes to account for this phenomenon: do better QA rather than do more manual QA.

What's the impact? This code getting into the master branch will mean half of the time templates will be linking to the wrong place: if a template tries to link to the login view via {% url "login" %} can you be sure it's going to the right place? No.

urls.py can get unwieldy as the codebase grows. There are strategies to improve the maintainability, but sometimes that's a moot point like when joining a new team or inheriting a brownfield codebase, so it's important to know how cure as well as prevent.

2. Hard coded static assets

When Django Doctor sees static urls used in template it suggests using the static template tag:

Alt Text

If you’re unfamiliar: {% static ... returns the path the browser can use to request the file. At it's simplest, that would return a path that looks up the file on your local file system. That's fine for local dev but in prod we will likely use third-party libraries such as whitenoise or django-storages to improve performance of the production web server.

A common reason to not hard-code static URLs is because in prod the file may be served by S3 instead of the web server, but there are reasons other than "because S3" for why we use {% static ...:

File revving

Whitenoise has a storage backend STATICFILES_STORAGE that performs file revving for cache busting. As a result the file path rendered in the template is renamed: script.js may be renamed to script-23ewd.js. The STATICFILES_STORAGE generates a hash from the file’s contents and renames the file and generates a manifest file that looks like:

{ 'scripts.js': 'scripts-23ewd.js'}

This is called revving. {% static 'script.js' ... renders 'script-23ewd.js' in the template because 'script.js' is looked up in the manifest, resulting in 'script-23ewd.js' being resolved.

This is for cache busting: when the file content change the file name changes too, so a revved file can have cache headers set to cache forever in the CDN or browser. On the next deployment if the file contents changes so too does the file name being requested and thus the most up to date file is retrieved by the browser.

So {% static ... handles abstracting away cache-busting mechanisms so we can focus on templatey things in the template instead.

Websites can move home

Given enough time a website can change where it's served from:

  • From subdomain to path: abc.example.com to example.com/abc/ and xyz.example.com to example.com/xyz/. This has happened in a project I was in for SEO reasons.
  • Merging multiple repositories into one. This happened to a few projects I was involved with to make development more efficient.

At this point it would be unclear which app's files are served from /static/. Yes a developer can find a replace all of /static/ with /abc/static/ and /abc/static/. But if {% static ... was used they would not need to.

So {% static ... prevents Javascript and CSS from breaking because where the website is served from changed.

Serving from S3

Django suggest not serving static files from the web server in production. One way to do that is to serve the files from S3. If STATICFILES_STORAGE serves from S3, then the static domain will likely be different from webserver domain unless some configuration in Cloudfront ensures otherwise.

So in that case using {% static 'script.js' ... ultimately renders https://s3-example-url.com/script.js instead of /static/script.js.

So {% static ... handles changes in the domain the files are served from

3. Hard-coded URLs

Hard-coding URLs in templates makes changing URLs harder, and makes linking to the wrong page easier - so harms maintainability. The problem is compounded when hard-coding URLs is inconsistent: We manually checked a sample of the offending public projects and found a mix of hard-coding and not-hard-coding URLs in templates.

Inconsistency is even worse that consistently hard-coding URLs in templates because a well-meaning developer may change urls.py expecting their job to be complete, but they inadvertently caused 404 because the hard-coded URLs in the templates were not updated. This kind of problem would not show up in the code review either as the affected template would not be shown to the developer reviewing the PR.

Take for example:

# in urls.py
urlpatterns = [
    path('articles/', Articles.as_view()),
]
Enter fullscreen mode Exit fullscreen mode

and

# index.html
<a href="/articles/">Go to articles</a>
Enter fullscreen mode Exit fullscreen mode

One day the marketing department may decide the URL would perform better if it was renamed. So a dev finds and replaces the URL in templates - but that's a perfect time for bugs to slip in: what if one of the hyperlinks are missed? 404 will happen. If it's possible for the dev to make a mistake while changing the code it's also possible for the dev reviewing the change to miss it. Indeed reviewing HTML code for correct hyperlinks is like looking for a needle in a haystack.

For these reasons Django encourages giving URLs a name so they can be resoved later (rather than hard coding the URL). This be taken advantage of by defining a name in the url.py entry, and then using that name in templates.

So instead of above do:

# urls.py
urlpatterns = [
    path('articles/', Articles.as_view(), name='articles-list'),
]
Enter fullscreen mode Exit fullscreen mode

and

# in index.html
<a href="{% url 'articles-list' %}">Go to articles</a>
Enter fullscreen mode Exit fullscreen mode

Does your Django codebase have dodgy URLs?

Over time it's easy for tech debt to slip into your codebase. I can check that for you at django.doctor, or can review your GitHub PRs:

Alt Text

Or try out Django refactor challenge.

Top comments (0)