Using Cron

Cron is a very common job scheduler for linux. It allows users to run the same command or shell script periodically. Typically it is used to automate tasks, for example, every Monday run my script to plot last week's data. There are many guides to using  cron and crontab (the command for loading the cron job table). 

Cron on JASMIN

Generally cron is disabled on the JASMIN general access machines such as sci1.jasmin.ac.uk. This is to avoid people killing the machine by setting up lots of processing jobs when better alternatives, e.g. Lotus, are available. However, there are times when it is appropriate to use cron and so a generic cron service machine is provided. cron.jasmin.ac.uk is configured like sci*.jasmin.ac.uk, except cron is enabled. Anyone who can log into sci* should also be able to login to cron.jasmin.ac.uk. (the actual hostname of the machine is  cron1.ceda.ac.uk but please refer to it as  cron.jasmin.ac.uk wherever possible)

There are a few rules of the road to using this service:  

  1. Avoid process pile up: If a job has not finished before the cron starts the next instance of the same job then competition for resources probably means that job will also not finish. Eventually a mass of unfinished jobs will overwhelm the whole machine and it will crash. To avoid this jobs should test to see if the previous job is still running by using a lock file, or making sure the jobs timeout. The crontamer wrapper script, documented below, is available to help you implement this. 
  2. Expect it to break occasionally: Regardless of any measure introduced by users to stop process pile up, you can expect it to go rouge at some point. We will reboot the machine when this happens, probably without warning. We may remove offending jobs from the cron table but persistent offenders may be barred from using the service. 
  3. Don't do heavy processing on the cron machine itself. You can submit jobs to lotus (sbatch is on cron1.ceda.ac.uk) to offload the processing resource.

Common Cron Gotchas 

The bash shell environment when cron launches scripts is not identical to the one when working interactively. Annoyingly the path to common tools, like sbatch, may not have been setup so that you get an error message from cron when it works perfectly well interactively. A way to get round this is to source the .bash_profile in the crontab file so that the interactive environment is used by cron. 

24 * * * * . $HOME/.bash_profile; sbatch -W 12:0 mycmd.sh

Crontamer

Crontamer is a wrapper script to implement lock file and time out checking for cron jobs to avoid issues of ‘’’runaway’’’ jobs causing problems for everyone (e.g. where new jobs start before old ones finish, which can cause a snowball effect).

You can download the crontamer script from  https://github.com/cedadev/crontamer. It is already installed on cron1.ceda.ac.uk

Use cron as normal, but include the crontamer command before the users own script and arguments.
bash> crontab -l
#e.g. cron file entry to run job every day at 4am:
0 4 * * * crontamer -t 2h '/home/users/jblogs/bin/my_repeat_script.sh -opt1 arg1 arg2'
Note: Crontamer has a number of options which can be conflated with the options for the wrapped script. Use single quotes to make sure the script runs with the right options.
The flow of the crontamer script is like this: 
  • Check for existing lock file to indicate if the script is already running. If the lockfile is there and it can see the matching process still running then it exits silently. 
  • If the lock file is not there or it can't see the matching process on the system then it starts the wrapped script. 
  • Periodically check the wrapped script is running. If the script fails, with a non-zero exit return code, then it can email you.
  • If the script has been running longer than specified timeout (default 12hr) then it will be killed. 
Unless a named lock file is given the lock files are created in the /tmp directory as 
/tmp/crontamer.{unique_id}
These files should be cleaned up automatically, but users may need occasional checks or find them helpful for identifying problems running their scripts. The {unique_id} is based on a combination of the username, passed script and arguments, enabling multiple calls of a script with different arguments to be handled separately.