While I was adding Capybara tests for my Jekyll Website, I also stumbled upon HTMLProofer.
HTMLProofer allows to test your HTML output. It can check alt tags of images, if links are working and a few more things.
I decided to give it a try on my website. Initially I was surprised at the amount of output and decided that it was too much to fix all at once. So first, I only had it check for broken links. It found about 30 of 200 outgoing links that were not working.
Most of them were part of of my Hackership series where I sometimes linked to local startups that apparently didn’t make it until today. Unfortunately Link Rot is a thing.
Before I dealt with all the broken links, I started to integrate HTMLProofer into the test suite by adding a custom Rake task:
task :html_proofer do
build_dir = File.join(File.dirname( __FILE__ ), '_site')
unless File.directory?('test/_site')
`jekyll build -d #{build_dir} -V`
end
opts = {
url_ignore: [/localhost/],
empty_alt_ignore: true,
file_ignore: [/slides/],
typhoeus: {
ssl_verifyhost: 0,
ssl_verifypeer: false,
timeout: 30
}
}
HTMLProofer.check_directory(build_dir, opts).run
end
Using rake html_proofer
it builds the site and runs HTMLProofer with the given options on the Jekyll output. You can check the Travis CI integration in script/ci.rb and .travis.yml.
If you run into SSL issues with HTMLProofer, you may have to install libcurl4-openssl-dev
on Travis.
The last thing I did, was to fix the links, as it was the least compelling part of the task. There are many reasons why a link may be broken and almost each cause can be handled differently.
A missing article may be caused by a new URL structure and forgotten redirects, so I looked around on these sites and tried to find the correct link if possible.
Broken domains are a lost cause most of the times, although some startups had renamed themselves or were bought up, so using the new domain makes sense there.
Domains may be unreachable only temporarily, so I don’t want to remove the link and instead whitelist it.
This all took some time, but it paid off and now I can be certain that there’s no broken links in any of the HTML on this website.
As mentioned in the beginning, HTMLProofer has a couple of more nice features, but I didn’t get around to trying them, yet. In the next post I will probably have a look at the other features.
Top comments (1)
This is very cool, there's also a jekyll plugin that I found after reading this. After running it for the first time I got a similar abundance of results, I figure I'll have to chip away at it over the next few days...