Wes Bos posted a really useful video explaining how to scrape data from the web with NodeJS. In his second video he explained how to setup a schedule for this particular task. Something I'd never done before in Node so I thought this might come in useful in the future and therefore I should write a quick blog post about it.
Whereas in Wes his video he grabs data from his own social media pages, I'm going to create a small app that runs on a schedule and downloads a random image every day at 6PM. I know right, who doesn't want to have a random image popping up on his or her disk every day?!
A few things we need to install first:
// create dir, go into it and install packages
mkdir image-downloader && cd image-downloader &&
npm i node-cron node-fetch esm
A quick break down of what you've just installed:
- node-cron: this is the package for the task scheduler. It allows you to setup schedules that automatically perform something (often executes a function).
- node-fetch: a way to use the fetch api. Which is a native browser API - but we don't have a browser when we use node. You can also use another package here. Axios is very popular one. It just allows you to download the content behind a url. Typically you use this for connecting to APIs or scraping the web.
- esm: I had not used this one before but it's super useful. It allows you to write your code like you'd do in client side JavaScript such as in Vue or React. Which means you have access to things like import / exports. To enable this esm you have to install it and then add it to your run script. In my package.json file I added this line as the 'start' script:
"scripts": {
"start": "node -r esm index.js"
},
You could then run this script by doing npm run start
.
Create downloader
Now we got the necessary packages installed it's time to create the first file in which we'll just fetch one image: fetch.js
// fetch.js
import fetch from "node-fetch";
import fs from "fs";
// create a function that grabs a random image
const fetchingData = async () => {
const res = await fetch("https://picsum.photos/200?random");
const date = Date.now();
const dest = fs.createWriteStream(`./image-${date}.png`);
res.body.pipe(dest);
};
// export the function so it can be used in the index.js file
export default fetchingData;
In order to get a random picture each time you execute this script, I use Picsum. This website allows you to generate a random image with a fixed width and height. You can append those dimensions to the url. I also create a variable with the current date. This date will then be appended to the file name and prevents the files from being overwritten. Because we're working with promises here I'm using async/await.
If you want to test this file you can run it with node -r esm fetch.js
Setup a schedule
Next you want to create an index.js file. This will be the main entry file and this one contains the node-cron function:
import cron from "node-cron";
cron.schedule("* * * * *", () => {
console.log(`this message logs every minute`);
});
This is a very small app which if you execute it will log a message to the console. This message will be repeated every minute. Cool, but not very useful. Let's add our image fetcher by importing it. The index.js file will then look like so:
import cron from "node-cron";
import fetchingData from "./fetch";
cron.schedule("* * * * *", () => {
console.log(`one minute passed, image downloaded`);
fetchingData();
});
However, this will run the image downloader every minute. We can change the cron job by changing the first parameter that we're adding into the schedule function. The five stars you see mean that the function will run every minute. You can modify this by following this (taken from here):
# ┌────────────── second (optional)
# │ ┌──────────── minute
# │ │ ┌────────── hour
# │ │ │ ┌──────── day of month
# │ │ │ │ ┌────── month
# │ │ │ │ │ ┌──── day of week
# │ │ │ │ │ │
# │ │ │ │ │ │
# * * * * * *
At first, I didn't really understand what this meant. After a bit of Googling I found the following website that was really useful as a cheatsheet; the crontabguru
This means you can setup a schedule for literally any time. Maybe once a year? Or every Tuesday at 8am in January and July. There's really no limitation. I continued by setting up a schedule to make it download every day at 6PM by setting it to this: 0 18 * * *
The complete and final index.js
file is then:
import cron from "node-cron";
import fetchingData from "./fetch";
cron.schedule("0 18 * * *", () => {
console.log(`one minute passed, image downloaded`);
fetchingData();
});
Want to take a look at the full app or clone it? Head over to my Github here!
First post at Dev.to so be gentle. Originally published at andredevries.dev
Top comments (8)
Great post. It is a nice example of getting a scheduled app up and running. I especially like the link out to the crontabguru resource. That will be very helpful in understandingand those statements for this and other cron jobs. I also like the use of esm. I have been meaning to bring that into some of my projects to be abel to use import and export statements.
Very nice article! 👏 I've learned about
esm
. Never used before, wondering why? 🤔if i set scheduller every 1 minute, but task finish more than one minutes, how to prevent from overlaping job?
Question: the script got be ruining all the time in order for this to work right? So if I turn off my PC, it's gone. Or is it written to the OS crontab?
You can use a package manager like pm2 to run it forever in background.
Good question! But yes if you turn your machine off the cron job will stop. I’ll have a look if I can find an npm package that actually writes to the crontab.
Great post bro, love it simple and easy to understand.
Thx!!!
Hi! Can you suggest a solution for distributed scheduling for scalable microservices.