The problem
I have some legacy projects running on ancient versions of CMS (Drupal, WordPress) that are way past their reasonable lifetime. But I have no time or energy to maintain them anymore.
Over the years I found that using a CMS for these simple presentation websites is not a good choice for me because:
- Popular CMS are often target for hackers
- During upgrades all sort of things can and will go wrong and waste my time
- Those sites consume unnecessary resources on MySQL, PHP and Apache levels when all they do is show some static information
The solution
I simply wanted to make a mirror of what I have and display simple static HTML files instead of those backed by the CMS.
There are some solutions around, but I very much recommend giving website-scrapper a try. It does require a little bit of coding (in JavaScript), but I'd say it's as much as learning to work with some tool (looking at you HTTrack) and is for free.
Example
Here is the code that worked for one of my projects and might give you a quick start:
const scrape = require('website-scraper');
const options = {
urls: ['http://www.example.com'],
directory: './out',
sources: [
{selector: 'img', attr: 'src'},
{selector: 'link[rel="stylesheet"]', attr: 'href'},
{selector: 'script', attr: 'src'}
],
recursive: true,
maxRecursiveDepth: 10,
subdirectories: [
{directory: 'img', extensions: ['.jpg', '.png', '.svg']},
{directory: 'js', extensions: ['.js']},
{directory: 'css', extensions: ['.css']}
],
requestConcurrency: 1,
urlFilter: function(url) {
return url.includes('www.example.com');
},
};
scrape(options);
Conclusion
If you have a project that rarely ever gets any content update and you constantly have to fight with CMS / hosting issues, then:
- Write a script using website-scrapper that downloads your CMS backed website as a series of static files.
- Deploy the heap of static files on your hosting.
- Profit! No upgrade pain, no security issues, no server load.
Photo by Ryan Yeaman on Unsplash
Top comments (0)