DEV Community

Cover image for PowerShell Snippet: Crawling a sitemap
Niels Swimburger.NET ๐Ÿ”
Niels Swimburger.NET ๐Ÿ”

Posted on • Originally published at swimburger.net on

PowerShell Snippet: Crawling a sitemap

Here's a PowerShell function that you can use to validate that all pages in your sitemap return a HTTP Status code 200.

You can also use it to warm up your website, or ensure your website caching is warm after a cold boot.

Function CrawlSitemap
{
    Param(
        [parameter(Mandatory=$true)]
        [string] $SiteMapUrl
    );

    $SiteMapXml = Invoke-WebRequest -Uri $SiteMapUrl -UseBasicParsing -TimeoutSec 180;
    $Urls = ([xml]$SiteMapXml).urlset.ChildNodes
    ForEach ($Url in $Urls){
        $Loc = $Url.loc;
        try{
            $result = Invoke-WebRequest -Uri $Loc -UseBasicParsing -TimeoutSec 180;
            Write-Host $result.StatusCode - $Loc;
        }catch [System.Net.WebException] {
            Write-Warning (([int]$_.Exception.Response.StatusCode).ToString() + " - " + $Loc);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

You can use the script as follows:

CrawlSitemap -SiteMapUrl 'https://www.swimburger.net/sitemap.xml';
Enter fullscreen mode Exit fullscreen mode

I personally use it as part of my Continuous Delivery pipeline to warm up my site and Cloudflare's cache.

Hope it's useful!

Top comments (0)