Comment by Arpit Mohan on "How do you monitor various cronjobs in your app and make sure they don't fail?"

In my past job, we were running about 80 crons (hourly, daily and weekly) across a cluster of 100 machines.We ended up using an external system like Jenkins to schedule and execute the script. The code for the script is a normal function/API call. On a given schedule, Jenkins either SSH's into a particular machine and executes the cron script or simply hits an API (obviously with authentication). If the cron fails, Jenkins sends out an e-mail with the failure details.

I prefer this approach over messing around with a crontab because:

The notification system is generalized. I configure it once on Jenkins and it remains the same for any cron that we write in the future. I hate Postfix or Sendmail with vengeance. :D
If you have a distributed cluster, running a cron requires maintaining a distributed cron system. While Dkron exists, I'm not familiar with it and we already use Jenkins as a build system. No harm doubling it as a cron scheduler.
A ready web interface to look at the logs, history & statistics of past cron jobs. How long did it take to run? Is it taking more time today than a month back? Do we need to optimize our code?
A ready web interface to manage the cron system. Don't need a cron? Simply disable the build job. Need it on a different schedule? Just change the scheduler on the web. Takes the management pain away from me.

Hope this helps you design a better cron system.

Search Hashnode