DEV Community

Cover image for generate-sitemap 1.9.2 Released
Vincent A. Cicirello
Vincent A. Cicirello

Posted on

generate-sitemap 1.9.2 Released

TL;DR

I just released generate-sitemap 1.9.2, a GitHub Action for generating XML sitemaps for static websites. The generate-sitemap GitHub Action is implemented in Python, and generates an XML sitemap by crawling the GitHub repository containing the html of the site, using commit dates to generate <lastmod> tags in the sitemap.

This release, generate-sitemap 1.9.2, is primarily a fix for a minor bug in the regular expression used to detect whether a page has a meta robots noindex directive in the page head, and should thus be excluded from the sitemap. Presence of bug revealed by warning message generated by Python 3.12 about the regular expression in question that Python 3.11 and earlier does not warn about.

Changelog 1.9.2 - 2023-10-05

Fixed

  • Fix minor bug in regex used to detect if a page has a meta robots noindex directive in head.

CI/CD

  • Bump Python to 3.12 in CI/CD workflows when running unit tests.

Dependencies

  • Bump cicirello/pyaction from 4.14.1 to 4.25.0, including upgrading Python within the Docker container to 3.12.

More Information

Please consider starring generate-sitemap's GitHub repository:

GitHub logo cicirello / generate-sitemap

Generate an XML sitemap for a GitHub Pages site using GitHub Actions

generate-sitemap

cicirello/generate-sitemap - Generate XML sitemaps for static websites in GitHub Actions

Check out all of our GitHub Actions: https://actions.cicirello.org/

About

GitHub Actions GitHub release (latest by date) Count of Action Users
Build Status build CodeQL
Source Info GitHub GitHub top language
Support GitHub Sponsors Liberapay Ko-Fi

The generate-sitemap GitHub action generates a sitemap for a website hosted on GitHub Pages, and has the following features:

  • Support for both xml and txt sitemaps (you choose using one of the action's inputs).
  • When generating an xml sitemap, it uses the last commit date of each file to generate the <lastmod> tag in the sitemap entry. If the file was created during that workflow run, but not yet committed, then it instead uses the current date (however, we recommend if possible committing newly created files first).
  • Supports URLs for html and pdf files in the sitemap, and has inputs to control the included file types (defaults include both html and pdf files in the sitemap).
  • Now also supports including URLs for a user specified list of additional file extensions in the sitemap.

For more information, see my earlier post about generate-sitemap here on DEV, as well as its webpage.

generate-sitemap - Generate an XML sitemap for a GitHub pages site using GitHub Actions

The generate-sitemap GitHub action generates a sitemap for a website hosted on GitHub Pages. Supports both xml and txt sitemaps. Uses the last commit date of each file to generate the lastmod tags in XML sitemaps. Parses robots.txt and scans html files for noindex directives, excluding URLs if noindex directives or disallows found.

favicon actions.cicirello.org

Where You Can Find Me

Follow me here on DEV and on GitHub:

Or visit my website:

Vincent A. Cicirello - Professor of Computer Science

Vincent A. Cicirello - Professor of Computer Science at Stockton University - is a researcher in artificial intelligence, evolutionary computation, swarm intelligence, and computational intelligence, with a Ph.D. in Robotics from Carnegie Mellon University. He is an ACM Senior Member, IEEE Senior Member, AAAI Life Member, EAI Distinguished Member, and SIAM Member.

favicon cicirello.org

Top comments (0)