DEV Community

Rahul
Rahul

Posted on

How to schedule jobs at scale

Let's discuss how to schedule and process millions of jobs.

Use-case:

  • Build a social media scheduler, where users can schedule their posts, like everyday at 7:30 PM and post them to their social media accounts automatically.
  • Build a product where users can trigger their ETL jobs or some code every 10 min.
  • Build a status page product to check your product health, speed etc for every 1 min.

Now let's dive into some solutions and their cons.

Cloudwatch/Eventbridge rules

  • Add a rule in eventbridge for every customer need and this rule takes care of hitting that particular job.
  • Easy to setup but has a hard limit of 300 rules per account, so not scalable.
  • Although this solution doesn't scale for your customers, but can be used for your own product jobs.

Code it

  • Take all the jobs which run in the next 10 min
  • Add it to "to-be-processed" table with exact time.
  • Continuously run a code/executor, to get all the jobs which have execution time <= current time and not processed, and execute them.

Example:

  1. Let's say you are running this scheduler cron every 10 min and current time is 7:28 PM. And you have job A scheduled at 7:32 PM.
  2. Now get all the jobs which needs to be run in the next 10 min that is 7:28 PM to 7:37 PM, which includes job A.
  3. And then push to "to-be-processed" table with exact time.
  4. Then your executor code will be getting all the jobs which have execution time <= current time and not processed, which also includes job A, and then executes them.

Issues:

  1. Not scalable with 1 executor.
  2. If working with multiple executor threads, need to handle concurrency and not execute the same jobs multiple times.
  3. This can re-run failed jobs. But should write a logic to stop after certain number of failures.

SQS

Image description

  • Using delay feature in SQS. This enables us to send messages now and consume them after some time.
  • Now take all the jobs which run in the next 10 min.
  • Add it to SQS queue with it’s specific delay.
  • Attach a lambda consumer/executor to the SQS, which will consume messages, when available and execute the jobs.

Example:

  1. Same example. Let’s say you are running this scheduler cron every 10 min and current time is 7:28 PM. And you have job A scheduled at 7:32 PM.
  2. Now get all the jobs which needs to be run in the next 10 min that is 7:28 PM to 7:37 PM, which includes job A.
  3. And push to SQS with it’s specific delay. So for job A, which needs to be run at 7:32 PM, which is 5 min after 7:28 PM/current time. Push job A with 5 min delay.
  4. Lambda consumer will get all the messages, when available. So job A will be available to consumer after 5 min, that is 7:32 PM and executes it.

Does it solve previous solutions issues?

  1. Since SQS, lambda is server less and highly scalable, this solution is scalable too.
  2. Since we can tell SQS to send message to consumers only once, this doesn’t execute the same job multiple times.
  3. If failed and we don’t delete message from SQS, this will re-run after certain time automatically (specified by us).
  4. Multiple time failed jobs can also be removed from further processing by using/pushing to dead letter queue automatically.

In Conclusion…

With this architecture, you can schedule a very large number of jobs. Let me know if you have a better solution in mind.


That’s it! Please let me know about your views and comment below for any clarifications.

If you found value in reading this, please consider sharing it with your friends and also on social media 🙏

Also, to be notified about my upcoming articles, subscribe to my newsletter below (I’ll not spam you 😂)

Blog of Codes | Rahul | Substack

Articles about cloud architecture and programming. Click to read Blog of Codes, by Rahul, a Substack publication. Launched a month ago.

favicon blogofcodes.substack.com

You can find me on Twitter and LinkedIn ✌️

Top comments (0)