DEV Community

Manuel Canga
Manuel Canga

Posted on

Web Optimization with ETags: An Example with WordPress

English version of my old post Optimización web con ETags. Ejemplo con WordPress

It's been a while since I last wrote about optimization. Those who know me are aware of why that happened. However, I can't let so-called WPO (Web Performance Optimization) experts stop me from writing about something I enjoy. So, here's a new post for you.

I'm sure this has happened to you. You arrive at your workplace, turn on your computer, open your email, and after checking it, open a terminal and type: git pull. The terminal quickly responds: Already up-to-date..

Have you ever wondered what happens behind that git pull? I have. If I had to guess, I'd say that when you do a git pull, you're transparently sending the server the date of the last change you have. The repository checks the date of the last change you send against the date of the last change it has, so:

  • If your date is older, it sends you all the pushes/changes that have occurred since then. It will also send you, along with those changes, the date they were made. So, if you typed git pull again, you would send the date of the last of those changes, and everything would start again.
  • If your date matches the date the repository has for the last change, it will tell you that everything is up-to-date.

This process, which seemed the most logical to me, is not the real one. The real one is similar but not exact. Every time a push is made, the repository associates a token (an alphanumeric identification code, something like ae3d9735f280381d0d97d3cdb26066eb16f765a5) with the latest commit. When you do a git pull, it compares the last token you have with the list of tokens it has. If your token is an old one, it sends the changes since then with their corresponding tokens. If the token was the latest, it tells you you're up-to-date.

At this point, you might say: Manuel, wasn't this post supposed to be about optimizing websites with WordPress? Indeed, it is. Both the first case presented (the one with the date) and the second one (the one with the token) are ways of working in the HTTP protocol. Let's take a closer look.

Last-Modified

Imagine your browser sends a request to my server to download the favicon of my website. In the response from my server to your browser, there will be a string (or HTTP header): Last-Modified: Thu, 29 Dec 2016 11:55:29 GMT. This tells your browser when the favicon was last modified. So, once your browser downloads the image, it will store it in its cache with the metadata "Last-Modified" and the value Thu, 29 Dec 2016 11:55:29 GMT.

If, after a few seconds, days, or months, you decide to visit my website again, your browser will need the favicon from my site again. However, it remembers it also has a copy of the image in its cache. How does it know if the favicon in its cache is the latest or if it needs to download it again? Simple, it performs a "git pull." That is, the browser sends a new request for the favicon to my server, indicating that it has a version of the image from a specific date. There are two possible responses from my server:

  • The current favicon on my website is newer, so my server will send the new image to your browser, along with the new last-modified date of this image.
  • The current favicon on my website matches the date indicated by your browser. That is, both the server's image and the browser's cached image are the same. Then, my server informs your browser that the image has not been modified (with the HTTP 304 Not Modified code). Your browser then uses the cached image, saving itself from having to download the image again (thus saving many bytes of your data plan).

ETags

If you remember, at the beginning of the post, I mentioned that Git uses tokens to determine when changes were made. HTTP, in addition to the last modified date, allows working with tokens called ETags (Entity Tags). An ETag is an alphanumeric code (such as 5864f9b1-47e) with no predetermined format (the HTTP standard does not specify, or barely specifies, what format the token should have). The site owner determines the format.

By default, web servers like Apache create the ETag for each file based on its modification date (and sometimes also the file size). This is redundant (the HTTP header for the last modified date is based on the same criteria) and not optimal (because it adds more information to requests that is of no use). In this case, it's advisable to configure your web server not to use ETags for files. For example, to disable file ETags (or FileETags) in Apache, add the following code to your .htaccess file: FileETag None.

You might be wondering if the dialog between the browser/server using an ETag is the same as we've seen for the last-modified date and using both methods is inefficient and redundant. Why use ETags, then?

The last-modified date is sufficient for HTTP requests for files, but it falls short for HTTP requests for web pages (HTML). A web page depends on many interrelated factors/elements (content, comments, HTML structure, etc.) and not just a single file. Therefore, it would be very complicated to find a unified last modified date for all these elements. I know this might be difficult to follow, so I'll try to explain it differently:

Imagine I assign the modification date of this web page (HTML) to the modification date of the text of the post. When your browser visits the page, it caches the page along with the post's last-modified date. If you visit it again a minute later, since the post has not changed (and thus, its modification date hasn't changed), your browser will use the cached version. If someone writes a comment and you visit again, you wouldn't see the comment. Since the post's text hasn't changed, the modification date hasn't either, so your browser would show you the cached version again. The same would happen if I change the HTML and add a new CSS file. The post content hasn't changed, nor has the date, so your browser would still show the cached version.

If, instead of working with last-modified dates for the post, we assign an ETag to the web page of the post with the following format: {post_content_modification_date}_{post_last_comment_date}_{WP_theme_version_number}

When your browser visits the post for the first time, it caches the web page (HTML) with its associated ETag as metadata. If any of the token criteria change (the post's modification date, the last comment date, or the current WP theme version), the ETag associated with the web page would be different. So, if you visit the post again, my server will notify you that your browser's ETag is not the latest, and it will resend the entire web page along with the new ETag.

If nothing has changed, the token/ETag would be the same (in both the browser and the server), so when you visit the page with your browser, my server would send a 304 response, notifying it that nothing has changed (in WPO terms, it's still "fresh") and that it should use the cached version.

Benefits of ETags

Something I haven't mentioned until now are the benefits of ETags. Here are a few:

  • Less data transferred between the server and the browser. This means data savings for the user (so your website is less expensive for your users "How much does it cost to visit your website?") and also for the server (important if you have a data-transfer-based hosting plan).
  • The server saves the effort of generating the HTML, with all that implies: saving memory and CPU, and relieving the database of work.
  • Much faster loading of your website, improving the user experience.

WordPress plugin

Everything we've covered is at a high level, so let's look at a small plugin that uses ETags for WordPress pages/posts.

# etags.php
<?php

namespace trasweb\webperf\ETags;

/*
 * Plugin Name:       ETags en posts
 * Plugin URI:        https://trasweb.net/webperf/optimizacion-web-con-etags
 * Description:       Usa el cache en navegador para tus posts.
 * Version:           0.0.1
 * Author:            Manuel Canga / Trasweb
 * Author URI:        https://trasweb.net
 * License:           GPL
 */

add_action('wp', function () {
    if (is_admin() || ! is_singular()) {
        return;
    }

    $etag_from_navigator = $_SERVER[ 'HTTP_IF_NONE_MATCH' ]??'';
    $current_ETag        = get_current_ETag();

    if ($etag_from_navigator === $current_ETag) {
        status_header(304);
        exit;
    }

    header('ETag: ' . $current_ETag);
});

function get_current_ETag()
{
    $last_modified_time_of_content = (int)get_post_time();
    $date_of_last_comment          = get_date_of_last_comment();
    $theme_version                 = wp_get_theme()[ "Version" ]??'0.0.0';

    return md5("{$last_modified_time_of_content}_{$date_of_last_comment}_{$theme_version}");
}

function get_date_of_last_comment()
{
    $query = [
        'post_id' => get_the_ID() ?: 0,
        'orderby' => ['comment_date_gmt'],
        'status'  => 'approve',
        'order'   => 'DESC',
        'number'  => 1,
    ];

    $last_comment = get_comments($query)[ 0 ]??null;

    return $last_comment->comment_date_gmt??0;
}
Enter fullscreen mode Exit fullscreen mode

First of all, let me mention that this plugin is for educational purposes only. As with any web optimization technique, such as minification/combination of CSS/JS resources or using server-side caching, a site study is required first.

As you can see, it couldn't be simpler. First, the ETag from the browser is obtained, if there is one (line 20). Second, the ETag associated with the current post/page is retrieved (line 21).

If both are the same, a 304 code is sent to the browser (line 24, which is the case shown in the main image of this post), and execution ends. The browser will receive the 304 code and will know that it should use the cached version of the page.

If the ETags are different (either because the browser is visiting for the first time or because the token has changed), the ETag is sent to the browser, and WordPress is allowed to continue its process (sending the content of the current post/page).

The ETag is generated in the function get_current_ETag (lines 31 to 38) based on the last time the post/page was modified, the date of the last comment on the post, and the version of the current theme. If any of these parameters change, the token will change, forcing the browser to download the new version of the website.

That's all. I hope you enjoyed this post and that it helps you make your website faster.


Share it, please

Top comments (0)