The other week I finally pushed full offline access to my blog. I'd taken a lot of inspiration from Jeremy Keith's service worker from his blog.
One defining feature I wanted to support was that if you were offline and visited that page isn't cached, I wanted to list recent blog posts you had visited.
The effect
If you're a regular visitor of this blog then my service worker (only deployed in the last few weeks) will collect those posts you visit in a dedicated cache. If you then try to visit a URL that hasn't been cached, say a post or page like popular posts (and so on) you'll be presented with a page saying that the page isn't available offline but you can re-visit an existing post:
In the service worker this is handled by the following lines:
self.addEventListener('fetch', event => {
/* this logic is trimmed down for brevity */
// only for HTML based requests
if (request.headers.get('Accept').includes('text/html')) {
event.respondWith(
fetch(request) // try the network first method
.then(response => {
// if we have a 200 success, cache the result
// in a cache called "v1/pages"
if (response.status === 200) {
caches
.open('v1/pages')
.then(cache => cache.put(request, response.clone()));
}
return response;
}).catch(() => {
// the catch fires if we're offline, so first we try the
// cache for a match, and if `response` is empty (or null)
// return the `/offline` page instead.
return caches
.match(request)
.then(response => response || caches.match('/offline'));
}) // end fetch
); // end event.respondWith
return;
}
})
However, the interesting part is how we retrieve the recently visited posts.
Showing the history
When I'd chatted to Jeremy about his offline/recently visited page I realised since the cache API is only for requests and responses, the metadata required for a history page (such as post title) would have to be stored elsewhere. Jeremy (IIRC) stores his metadata in localStorage
.
When I took my first stab at an implementation I used IndexedDB (along with Jake Archibald's idb keyval script). Then each page you visit needs to include the metadata about the post, which was adding a little more complexity to the problem.
Until, I realised I didn't need to store anything. HTML as an API.
Instead of capturing metadata, my posts, themselves in the markup, includes all the metadata about the post. So here's the logic without any additional store:
- Get all the entries stored in my
v1/pages
cache - Get the URL from
request.url
- Get the HTML from
await cache.match(request).then(res => res.text())
- Pattern match out the
<title>(.*)</title>
text - Capture the publish date - in my case it's part of the URL, in Jeremy's case it's in the
<time>
tag
In you're concerned that using a regex is brittle, the HTML could be put inside a DOM parser and queried out again. You can see that idea in action here (open the browser console) using code such as:
const p = new DOMParser();
const dom = p.parseFromString(html, 'text/html');
console.log(dom.querySelector('time').getAttribute('datetime'));
For my offline listings code, the actual code looks like this:
async function listPages() {
// since my cache names are versioned, look for the one that
// includes "/posts"
const cacheNames = await caches.keys();
// results is recently visited blog posts
const results = [];
for (const name of cacheNames) {
if (name.includes('/posts')) {
const cache = await caches.open(name);
// get a list of all the entries (keys are requests)
for (const request of await cache.keys()) {
const url = request.url;
// this regex gets both the publish date of the post,
// but also ensures the URL is a blog post
const match = url.match(/\/(\d{4})\/(\d{2})\/(\d{2})\//);
if (match) {
const response = await cache.match(request);
// capture the plain text HTML
const body = await response.text();
// regex for the title of the post
const title = body.match(/<title>(.*)<\/title>/)[1];
results.push({
url,
response,
title,
// published date is from the URL
published: new Date(match.slice(1).join('-')),
// last visited is the `date` prop in the response header
visited: new Date(response.headers.get('date'))
});
}
}
}
}
// now display the results
if (results.length) {
// sort the results, map each result to an <li> tag and put
// in the `ul#offline-posts` element
document.querySelector('ul#offline-posts').innerHTML = results
.sort((a, b) => a.published.toJSON() < b.published.toJSON() ? 1 : -1)
.map(res => {
// results in:
// <li><a href="…">[Title] <small>[pubDate] (visited X days ago)</small></a></li>
let html = `<li><a href="${res.url}">${
res.title
}</a> <small class="date">${formatDate(
res.published
)} <span title="${res.visited.toString()}">(visited ${daysAgo(
res.visited
)})</span></small></li>`;
return html;
})
.join('\n');
}
}
The /offline
page is going to do a bit of JavaScript, scraping text out of cached pages to show you recently browsed results. At first I felt like this may be a lot of work for the browser to be doing, but since it only happens in exceptional circumstances and in reality it takes a handful of milliseconds, the improved user experience is worth this (relatively) small hit.
Oh, and as it happens, this page is now in your recently visited list :)
Originally published on Remy Sharp's b:log
Top comments (0)