loading...

XML Sitemaps with Lucky

jwoertink profile image Jeremy Woertink Updated on ・6 min read

When it comes to building a website, SEO is pretty important. You want to make sure that when people search for things, your website is relevant enough to be displayed to the user. SEO really is pretty magical, but there's a few things we can add to help out.

I'm going to talk about generating sitemaps for your Lucky app, though, most of this could apply to other frameworks in crystal too.

Sitemaps

Sitemaps are XML documents that live in the public root of your website. These allow search engines to map and understand the entire structure of your website. You could let the search engines just use crawlers to go page by page, but that could take a lot of time, and potentially miss some pages. With a XML sitemap, the search engine knows of all the URLs to the pages, and even some other relevant metadata about those pages. It can go directly to the page, save it, and reference it later.

There's actually a standard format used for these documents. You can check out Sitemaps.org for more information on how these are structured, and what other options are available. The one issue with this is that this standard has a lot more information than what their site shows. A lot more information can be found on Google Webmaster. This includes how to use videos and images in your sitemaps.

The first thing we need to do is include the Sitemapper shard in to our lucky project.

dependencies:
  lucky:
    github: luckyframework/lucky
    branch: master
  sitemapper:
    github: jwoertink/sitemapper
    version: ~> 0.2

Once that's in and you run your shards install, you'll be ready!

When generating sitemaps, this is a task that happens outside the normal operation of your site. It doesn't require user interaction for your sitemaps to be generated; this is an administrative task. You'll also have new pages added and/or removed all the time, so doing this statically isn't really going to be feasible for most of you. We will want to use Lucky Tasks.

Let's create a new task called generate_sitemaps.cr. This file will live in your_app_root/tasks/generate_sitemaps.cr.

Now you will put this in that file. This is the bare minimum for a task (along with the sitemapper)

require "lucky_cli"
require "sitemapper"

class GenerateSitemaps < LuckyCli::Task
  banner "Generates the sitemaps"

  def call
  end
end

Before getting crazy, make sure you can run this and everything compiles. From our app directory, if we just run lucky generate_sitemaps, it will do nothing! We want nothing at this point. Anything else like compile errors, or whatever, will allow us to catch things before we start getting a ton of actual code in here.

Next step will be to start writing your code. The nice thing about Sitemapper is that if you need to generate a bunch of different sitemaps for different sites, like say a mutli-tenant type site, then you have that ability. I'll start with a single domain, then show a quick example of how you could do multiple.

Sitemap setup

We will be adding our sitemap code to the call method in our task.

def call
  sitemaps = Sitemapper.build(host: "https://mycoolapp.io", max_urls: 500, use_index: true) do
    add("/", lastmod: Time.now, priority: 1.0)
  end
end

Ok, since there's quite a bit going on here, let me break this down.

  1. host - this option is to specify your domain. In the sitemaps, all of the URLs are absolute which means your root path will look like https://mycoolapp.io/
  2. max_urls - According to the official spec, your sitemap "must have no more than 50,000 URLs and must be no larger than 50MB". By default, Sitemapper sets you to 500 per sitemap, but you can make this higher.
  3. use_index - This is false by default, but if you know you have more than max_urls pages, then you need to set this to true.

About the index

It's pretty common for a site to have a ton of pages when you consider 1 page for each user you have, or whatever your site is about. Assuming you have user profiles, and you have 20,000 users, then you're going to have over 20,000 pages including your home page, privacy policy and terms pages, an about page maybe? Since this information could easily get huge in size, we have the ability to generate multiple sitemaps for the same domain. They would look like mycoolapp.io/sitemap1.xml, mycoolapp.io/sitemap2.xml, etc... This lets the search engines see that you have a lot of pages, all while keeping the size of each one low. However, the search engines don't know what these sitemaps are called. The only one that they look for is /sitemap.xml. So what we're doing in this case is using our /sitemap.xml file as an index of mini sitemaps that point to where the other files are located. You can read more on sitemap indexes here.

Back to the code

Next we had this line add("/", lastmod: Time.now, priority: 1.0). This is adding in the root path of our site (our home page). Then we tell the sitemap when we last modified the page using lastmod. This is a Time instance. Lastly is priority. This is a float between 0.0 and 1.0 that shows the priority relative to the other URLs in the sitemap. This add method takes any option that's available in the sitemap protocol, but all of the required options are built in for you.

Let's take a look at our user profiles:

def call
  # Select all users with a public profile, and order by their ID in an ASC order
  users =  User::BaseQuery.new.profile_public(true).id.asc_order
  sitemaps = Sitemapper.build(host: "https://mycoolapp.io", max_urls: 500, use_index: true) do
    add("/", lastmod: Time.now, priority: 1.0)

    users.each do |user|
      # we could use a messy "/profiles/#{user.username}", but this is nicer
      path = Users::Show.path(slug: user.username)
      add(path, lastmod: user.updated_at, priority: 0.6)
    end
  end

  # finalize the sitemaps here
end

At this point you should have enough info to start adding some more pages, but I'd like to take a look at one more set. Let's assume that our users can upload some videos to their profile page, and these videos are paginated. This adds some complexity, but not horrible.

users.each do |user|
  user_path = Users::Show.path(slug: user.username)
  add(user_path, lastmod: user.updated_at, priority: 0.6)

  # get the videos for this user with most recent upload first
  videos = Video::BaseQuery.new.by_user(user.id).uploaded_at.desc_order

  # Add pagination for /profiles/jeremy?page=XX
  total_rows = videos.count
  limit = 12 # 12 videos per page
  total_pages = (total_rows / limit.to_f).ceil
  # skip page one because we added that already
  (2..total_pages).to_a.each do |page|
    add("#{user_path}?page=#{page}", lastmod: user.updated_at, priority: 0.4)
  end

  # Now we add the individual video pages
  videos.each do |video|
    # There's a ton of options here
    map = Sitemapper::VideoMap.new(thumbnail_loc: video.thumbnail_url, title: video.title, description: video.description, content_loc: video.source_url, publication_date: video.released_at)
    video_path = Videos::Show.path(user_slug: user.username, video_slug: video.cached_slug)
    add(video_path, lastmod: video.updated_at, priority: 0.7)
  end

end

NOTE: If you see quotes around the options in the code above, that's an error in Dev.to

You can read up on more video options looking at the source, or on Google.

That's about it. The rest will really depend on your site. The only thing left I wanted to show was, a small example of using a multi-tenant site.

def call
  sites = Site::BaseQuery.new.active(true)

  sites.each do |site|
    sitemaps = Sitemapper.build(host: "https://#{site.host}", max_urls: 500, use_index: true) do
      # add site specific routes here
    end

    # finalize the sitemaps here
  end
end

Finalizing the sitemaps

Sitemapper currently shoves all of the generated XML in to giant strings in memory. This might not be "optimal", but my app generates over 400 sitemaps, and it's been fine so far...

The reason for this is that not everyone can just write an XML file to their public folder and be done with it. If you're hosted on Heroku, or you're using Docker, you may want to use something like S3 or whatever to store the actual sitemaps. If you can just write the XML locally, then you can use the built-in function to do that.

# this is your_app_root_path/public/sitemaps
# it will generate the folder for you if it needs to
Sitemapper.store(sitemaps, "public/sitemaps/")

Eventually there should be some additional options for storing to S3 or whatever, but if you need that, you'll need to take the sitemaps variable, and send that data wherever you need.

Discussion

pic
Editor guide