DEV Community

loading...
Cover image for Understand Django: Per-visitor Data With Sessions

Understand Django: Per-visitor Data With Sessions

Matt Layman
Matt Layman is a software engineer from Frederick, MD. He is an open source software maintainer and advocate for Python.
Originally published at mattlayman.com ・8 min read

In the last Understand Django article, we saw what it takes to make your Django project live on the internet. Now, we'll get back to a more narrow topic and focus on a way Django can store data for visitors to your site. This is the kind of data that doesn't often fit well into your Django models and is called session data.

What Is A Session?

As I was learning Django, I'd run into sessions occasionally and accept that I didn't really understand them. They felt like magic to me. But what is a session?

A session is a set of data that is available to users that Django can use over multiple requests.

From a development perspective, the data is different from the regular data that you would store in a database with Django models. When working with session data, you don't query the database using the ORM. Instead, you can access session content via the request.session attribute.

The request.session is a dictionary-like object. Storing data into the session is like working with any other Python dictionary.

# application/view.py

from django.http import HttpResponse

def a_session_view(request):
    request.session['data_to_keep'] = 'store this'
    return HttpReponse('')
Enter fullscreen mode Exit fullscreen mode

When Django stores the session data, the framework will keep the data in a JSON format. What is JSON? I've mentioned JSON in passing in previous articles, but now is a decent time to explain it. Knowing what JSON is will help you understand what happens to session data as the data is stored.

The "What is JSON?" Sidebar

JSON is a data format. JSON is a way of describing data so that the data can be stored or transmitted. The definition of that format is listed on the official JSON website and can be understood in probably 10 minutes or less.

That stored data can be parsed based on the definition of the format to recreate the data at a different time or on a different computer. In general, you can view JSON as a tool to take Python dictionaries or lists and store or transmit them for use elsewhere.

The Python standard library includes a module for working with JSON data. Here's an example to give you an idea of what JSON output looks like.

>>> import json
>>> data = {'hello': 'world'}
>>> json.dumps(data)
'{"hello": "world"}'
>>> json_string = json.dumps(data)
>>> parsed_data = json.loads(json_string)
>>> parsed_data
{'hello': 'world'}
>>> data == parsed_data
True
Enter fullscreen mode Exit fullscreen mode

The dumps and loads functions transform data to and from a string, respectively (the s in the those function names stands for "string").

JSON is an extremely versatile format and is used all over the internet. Getting back to sessions, JSON is a good fit because there are multiple places where Django can store session data. Let's look at those next.

Session Storage

You probably know this drill by now if you've been following this series. Like the template system, the ORM, and the authentication system, the session application is configurable with multiple different "engines" to store session data.

When you start a new Django project with the startproject command, the session engine will be set to django.contrib.sessions.backends.db. This is because the SESSION_ENGINE setting will be unset in your settings module, and Django will fall back to the default.

With this engine, Django will store session data in the database. Because startproject includes the django.contrib.sessions app in INSTALLED_APPS, you'd probably see the following stream by when you migrate your database for the first time.

(venv) $ ./manage.py migrate
  ...
Running migrations:
  ...
  Applying sessions.0001_initial... OK
Enter fullscreen mode Exit fullscreen mode

The Session model stores three things:

  • A session key that uniquely identifies the session in the storage engine
  • The actual session data, stored in JSON format, in a TextField
  • An expiration date for the session data

With these three fields, Django can handle the temporary storage needs for any of your site's visitors.

Why is the session engine configurable? Django's session storage is configurable to manage tradeoffs. The default storage of a database engine is a safe default and the easiest to understand. The answer to "Where is my app's session data?" is "In the database with all of my other application data."

If your site grows in popularity and usage, using the database to store sessions can become a bottleneck and limit your performance and application scaling. Additionally, the default engine creates an ever expanding set of database rows in the Session model's table. You can work around the second challenge by periodically running the clearsessions management command, but what if the performance is a problem for you?

This is where other storage engines might be better for your application. One method to improve performance is to switch to an engine that uses caching. If you have set up the caching system with a technology like Redis or Memcached, then a lot of session load on the database can be pushed to the cache service. Caching is a topic we will explore more in a future article, so if this doesn't make too much sense right now, I apologize for referencing concepts that I haven't introduced yet. For the time being, understand that caching can improve session performance.

Another session storage engine that can remove load from a database uses the browser's cookie system. This system will certainly remove database load because the state will stored with the browser, but this strategy comes with its own set of tradeoffs. With cookie-base storage:

  • The storage could be cleared at any time by the user.
  • The storage engine is limited to a small amount of data storage by the browser, based on the maximum allowed size of a cookie (commonly, only 4kB).

Choosing the right session storage engine for your application depends on what the app does. If you're in doubt, start with the default of database-backed storage, and you should be fine initially.

How Does The Session System Identify Visitors?

When a visitor comes to your site, Django needs to associate the session data to the visitor. To do this association, Django will store a session identifier in a cookie on the user's browser.

On the first visit, the session storage engine will look for a cookie with the name sessionid (by default). If the application doesn't find that cookie, then the session storage will generate a random ID and ensure that the random idea doesn't conflict with any other session IDs that already exist.

From there, the storage engine will store some session data via whatever mechanism that engine uses (e.g., the database engine will create a new session row in the table).

The session ID is added to the user's browser cookies for your site's domain. Cookies are stored in a secured manner so only that browser will have access to that randomly generated value. The session ID is very long (32 characters), and the session will expire after a given length of time. These characteristics make session IDs quite secure.

Since sessions are secure and can uniquely identify a browser, what kind of data can we put in there?

What Uses Sessions?

Sessions can store all kinds of data, but what are some real world use cases? You can look in Django's source code to find some immediate answers!

In my estimate, the most used part of Django that uses sessions heavily is the auth system. We explored the authentication and authorization system /understand-django/user-authentication/ At the time, I mentioned in the pre-requisites that sessions were required, but I noted that sessions were an internal detail. Now that you know what sessions are about, let's see how the auth system uses them.

If you look into the session data after you've authenticated, you'll find three pieces of information:

  • The user's ID (stored in _auth_user_id)
  • The user's hash (stored in _auth_user_hash)
  • The string name of the auth backend used (stored in _auth_user_backend)

Since we know that a session identifies a browser and does so securely, the auth system stores identity information into the session to tie that unique session to a unique user. When a user makes a request with these data elements, the auth system can determine if the request data is valid and should be considered authenticated.

The auth system will read which backend is used and load that backend if possible. The backend is used to load the specific user record from the ID found in the session. Finally, that user is used to check if the hash provided validates when compared to the user's hashed password (there is some extra hashing involved to ensure that the user's password hash is not stored directly in the session). If the comparison checks out, the user is authenticated and the request proceeds as an authenticated request.

You can see that the session is vital to this flow with the auth system. Without the ability to store state in the session, the user would be unable to prove who they were.

Another use of sessions is found with CSRF handling. The CSRF security features in Django (which I mentioned in the forms article and we will explore more in a future topic) permit CSRF tokens to be stored in the session instead of a cookie when the CSRF_USE_SESSIONS setting is enabled. Django provides a safe default for CSRF tokens in cookies, but the session is an alternative storage place if you're not happy enough with the cookie configuration.

As a final example, we can look at the messages application. The messages app can store "flash" messages. A flash messages is the kind of message that you'd expect to see on a single page view. For instance, if you have a message that you'd like to display to a user upon some action, you might use a flash message. Perhaps your application has some "Contact Us" form to receive customer feedback. After the customer submits the form, you might want the application to flash "Thank you for the feedback!"

# application/views.py

from django.contrib import messages
from django.views.generic import FormView
from django.urls import reverse_lazy

from .forms import ContactForm

class ContactView(FormView):
    form_class = ContactForm
    success_url = reverse_lazy("application:index")

    def form_valid(self, form):
        messages.add_message(
            self.request, messages.INFO, "Thank you for the feedback!")
        return super().form_valid(form)
Enter fullscreen mode Exit fullscreen mode

In the default setup, Django will attempt to store the flash message in the request's cookies, but, as we saw earlier, browsers constrain the maximum cookie size. If the flash messages will not fit in the request's cookies, then the messages app will switch to the session as a more robust alternative. Observe that this might run into problems if you are using the session's cookie storage engine!

I hope that these examples from Django's contrib package provide you with some ideas for how you might use sessions in your own projects.

Summary

In this article, we dug into Django sessions and how you use them. We saw:

  • What sessions are and the interface they expose as request.session
  • How JSON is used to manage session data
  • Different kinds of session storage that are available to your site
  • The way that Django recognizes a user's session in the browser
  • Examples within django.contrib of how sessions get used by Django's built-in apps.

In the next article, we are going to spend time focusing on settings in Django. You'll learn about:

  • Various strategies for managing your project's settings
  • Django's tools to help with settings
  • Tools in the larger Django ecosystem that can make your life easier

If you'd like to follow along with the series, please feel free to sign up for my newsletter where I announce all of my new content. If you have other questions, you can reach me online on Twitter where I am @mblayman.

Discussion (0)