To gain the best of SEO we should use a sitemap for our sites. Search engines like Google can use those sitemaps to crawl our sites more efficiently.
Create a sitemap
We could simply create a sitemap, like the following and put it into the public directory of our app:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://sdorra.dev/</loc>
</url>
<url>
<loc>https://sdorra.dev/posts/2022-11-15-sitemaps-with-appdir</loc>
</url>
</urlset>
But this is very error prone. Every time we create a new blog post or add another page to the site we have to manually modify the sitemap.
I think we can do better.
Generate sitemap
We could generate our sitemap at build time by traversing the app directory and write it afterwards to the public directory:
import { writeFile } from "fs/promises";
import { globby } from "globby";
const PAGE = "https://sdorra.dev";
const createPath = (p) => {
const path = "/" + p.replace("page.tsx", "");
if (path.endsWith("/") && path.length > 1) {
return path.substring(0, path.length - 1);
}
return path;
};
const collectPaths = async () => {
const paths = await globby("./**/page.tsx", {
cwd: "app",
});
return paths.map(createPath);
};
const createSitemap = async (routes) => {
return `
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${routes
.map((route) => {
return `
<url>
<loc>${`${PAGE}${route}`}</loc>
</url>
`;
})
.join("")}
</urlset>
`;
};
(async () => {
const paths = await createPaths();
const sitemap = await createSitemap(paths);
await writeFile("./public/sitemap.xml", sitemap, { encoding: "utf-8" });
})();
The snippet above uses globby to find each page.tsx
in the app directory (collectPaths
). After the file paths are collected, each path is transformed to the url path (createPath
). The last step is to create the sitemap (createSitemap
) and write to the public directory.
This works well, until we introduce dynamic routes to our app.
Dynamic routes
A dynamic route can be identified by the square brackets around a variable e.g.: app/posts/[slug]/page.tsx
. To handle those routes in the sitemap generator could become very tricky.
I will use a simple example how it could be implemented if the site uses contentlayer:
import { writeFile } from "fs/promises";
import { globby } from "globby";
import { allPosts } from "../.contentlayer/generated/index.mjs";
const PAGE = "https://sdorra.dev";
const expandPath = (p) => {
if (p === "/posts/[slug]") {
return allPosts.map((p) => `/posts/${p._raw.flattenedPath}`);
}
return [p];
};
const createPath = (p) => {
const path = "/" + p.replace("page.tsx", "");
if (path.endsWith("/") && path.length > 1) {
return path.substring(0, path.length - 1);
}
return path;
};
const collectPaths = async () => {
const paths = await globby("./**/page.tsx", {
cwd: "app",
});
return paths.map(createPath).flatMap(expandPath);
};
// rest of scripts/sitemap.mjs
The first interesting line in this snippet is the import of the posts generated by contentlayer (allPosts
). I had to specify the whole path including extension, otherwise typescript threw errors at me. In the expandPath
function we will check the path and if it is our dynamic route, we will replace it with the slugs of all posts. This function is finally used by the collectPath
function to create our flat array of paths.
Warning: This method covers only the basics. If you are use something like grouped routes or the old pages
directory is used in addition to the app directory, you have to go a few extra miles.
Build
Now it is time to add our generator to the build:
{
"scripts": {
"build": "next build && node scripts/sitemap.mjs"
}
}
Whenever our page is build using the build
script, our sitemap is build afterwards. The order is important, because contentlayer have to run first in order to generate our content types.
robots.txt
We could specify the url to our sitemap in the robots.txt
, this makes it easier for search engines to find our sitemap. If we use the default path /sitemap.xml
most of the search engines should find it without the entry in the robots.txt
. However the entry could look like the following:
User-Agent: *
Allow: /
Sitemap: https://sdorra.dev/sitemap.xml
Git
Since our sitemap is generated, we should not include it in our repository. So we should ignore it:
/node_modules
/.next/
.contentlayer
public/sitemap.xml
If we would not do that, we would see changes to the sitemap.xml
every time we change the structure of the page.
Top comments (1)
Hey, it was a nice read, you got my follow, keep writing!