Creating a sitemap.xml file was something that always nagged at me when working with headless content management systems. "What do you mean Contentful doesn't do sitemaps?!" my SEO colleagues would say--not understanding what headless means fundamentally. This was one thing that the old monolithic systems like wordpress seemed to have in the bag.
My Early Approaches
A year ago, I worked out an initial solution that involved using a chron job to create the file regularly. Sadly most cloud hosting providers (Heroku & now.sh) don't allow for adding files after the build is deployed so you now have to save this to a CDN like S3.
I later tried an approach that moved building the sitemap being triggered off of a webhook on every publish event inside of Contentful. The problem with this is that you have to make sure you are saving to the same URL inside S3 and that you still have the same added S3 dependency.
You could do a full rebuild on every webhook event to save the file which, is something many static site evangelists are comfortable with. However, as your site gets larger (and maybe handles lots of money), having builds happen at the drop of a hat just makes me uneasy. It's just more moving parts to worry about. There had to be a better way, I wanted to keep my site dynamic with a good cache, and ensure builds only happen for code changes not content changes. I also wanted to ditch the extra S3 dependency.
The New Method
Thankfully, Next.js can do this inside it's getInitialProps
hook and serve up the XML file easily. You can setup the sitemap page, have it build on the server, set it and forget it.
First create the sitemap.js
file inside of the pages directory.
touch ./pages/sitemap.js
Install the xmlbuilder
package:
npm install xmlbuilder
or yarn add xmlbuilder
whichever you prefer.
Then configure the following to your liking based upon your contentful models. I use a pages
and articles
model here as examples but you may have many more.
import { createClient } from '../services/contentful';
import * as builder from 'xmlbuilder';
const rootUrl = 'https://yourhomepage.com';
const buildUrlObject = (path, updatedAt) => {
return {
'loc': { '#text': `${rootUrl}${path}` },
'lastmod': { '#text': updatedAt.split('T')[0] },
'changefreq': { '#text': 'daily' },
'priority': { '#text': '1.0' }
}
}
const Sitemap = () => ( null );
Sitemap.getInitialProps = async ({ res }) => {
try {
const client = createClient();
const pages = await client.getEntries({
content_type: 'page',
limit: 1000,
include: 1
});
const articles = await client.getEntries({
content_type: 'article',
limit: 1000,
include: 1
});
let feedObject = {
'urlset': {
'@xmlns': 'http://www.sitemaps.org/schemas/sitemap/0.9',
'@xmlns:image': 'http://www.google.com/schemas/sitemap-image/1.1',
'url': []
}
}
for (const item of pages.items) {
if (typeof item.fields.slug !== 'undefined') {
feedObject.urlset.url.push(
buildUrlObject(`/${item.fields.slug === 'index' ? '' : item.fields.slug}`, item.sys.updatedAt)
);
}
}
for (const item of articles.items) {
if (typeof item.fields.slug !== 'undefined') {
feedObject.urlset.url.push(
buildUrlObject(`/blog/${item.fields.slug}`, item.sys.updatedAt)
);
}
}
for (const item of posts.items) {
if (typeof item.fields !== 'undefined') {
feedObject.urlset.url.push(
buildUrlObject(`/the-salon/${item.fields.slug === 'index' ? '' : item.fields.slug}`, item.sys.updatedAt)
);
}
}
const sitemap = builder.create(feedObject, { encoding: 'utf-8' });
if (res) {
res.setHeader('Cache-Control', 's-maxage=5, stale-while-revalidate');
res.setHeader('Content-Type', 'application/xml');
res.statusCode = 200;
res.end(sitemap.end({ pretty: true }));
}
return;
} catch(error) {
return { error: 404 };
}
};
export default Sitemap;
Notes: I like to extract my contentful service into a services
directory but you can put the contentful package or whatever headless CMS you want to use in here instead. I also use the slug index
for the homepage in contentful so I have that ternary check in here to not include the slug. Again configure as needed. I've also limited this to 1000 articles and pages but if you have more you may want to do some pagination magic there as well.
Deployment
To configure this for deployment on now.sh you just need to head on over to your now.json
file and setup accordingly. Also make sure you add the route for your robots.txt
file here. This can be stored in static but you will want it accessible off of the route.
{
"version": 2,
"alias": "my-sitemap-sample",
"name": "my-sitemap-sample",
"builds": [{ "src": "next.config.js", "use": "@now/next" }],
"routes": [
{ "src": "^/robots.txt", "dest": "/static/robots.txt" },
{ "src": "/sitemap.xml", "dest": "/sitemap" }
]
}
Scaling
As your site grows it may take some time to build and serve up this file. I like to use a service like cloudflare and it's caching to mitigate this. So far I haven't hit any speed traps but know that on a super large sitemap it might be a good idea to break this into multiple sitemaps on different routes at a certain point.
Hope this helps others out as it's helped me.
Top comments (5)
Hi!Great article, well done. Just a quick note, I think you're missing a piece as posts.items is flagged as not defined.
Maybe you want to add something like:
Hi Asher,
So for this section of code the
posts.items
will come back as undefined if you don't have acontent_type
of posts inside your Contentful account. You would probably want to wrap that in anif
statement in production in case yourcontent_type
returns no items. But you will likely have multiple types you want to add here for example posts, pages, categories, etc. Anything that represents a page that you want to expose to google.Great post Mike! Curious how you feel about the circular dependency issues in xmlbuilder? I need to implement a similar solution in a mono repo using rollup and running into circular issues.
Hi Aimee,
So I haven't run into circular issues with xmlbuilder with this implementation. However, I have run into this many times with Contentful. The most common foot gun when working in Contentful is having a Content type that also contains a reference to itself. This is typically something that happens if you want to say have a blog post with a list of recommended blog posts that also contain their own blog post recommendations. What I think you have to do for the site map is set the
include
level to one so it doesn't fetch deeply nested references. In general I try to avoid doing that type of referencing with Contentful now and rather use an additional query for a use case like recommended articles and recommended products etc.I hope that helps a bit but I know circular references can be very frustrating.
Thanks for the reply Mike!
I found that the xmlbuilder library has some circular dependency issues, but they have been fixed in xmlbuilder2! We've handled our Contentful related issues with some safe stringifiying, and definitely learned that includes query param lesson the hard way :)
Cheers!