How to submit a job to SLURM

This article explains how to submit a batch job to the new scheduler SLURM. It covers:

  • What is a batch job?
  • Job submission methods
  • Important job submission options
  • Job array submission
  • Job dependencies submission
  • Recursive job submission

What is a batch job?

A batch job is controlled by a script written by the user who submits the job to the batch system SLURM. The batch system then selects the resources for the job and decides when to run the job.  Note: the term "job" is used throughout this documentation to mean a "batch job".

There are two ways of submitting a job to SLURM:

  1. Submit via a SLURM job script - create a bash script that includes directives to the SLURM scheduler
  2. Submit via command-line options - provide directives to SLURM via command-line arguments

Both options are described below.

Which servers can you submit jobs from?

Jobs can be submitted to SLURM from the following Sci machines:

Method 1: Submit via a SLURM job script

The  SLURM job submission command is:

$ sbatch myjobscript

The job script is a Bash script of user's application and includes a list of SLURM directives, prefixed with `#SBATCH` as shown in this example:

#SBATCH --partition=short-serial 
#SBATCH -o %j.out 
#SBATCH -e %j.err
#SBATCH --time=5

# executable 
sleep 5m

For job specification of resources please refer to Table 2 of the help article LSF to SLURM quick reference

Method 2: Submit via command-line options

If you have an existing script, written in any language, that you wish to submit to LOTUS then you can do so by providing SLURM directives as command-line arguments. For example, if you have a script "" that takes a single argument "-f <filepath>", you can submit it using "sbatch" as follows:

sbatch -p short-serial -t 03:00 -o job01.out -e job01.err -f myfile.txt

This approach allows you to submit jobs without writing additional job scripts to wrap your existing code.

Job array submission

Job arrays are groups of jobs with the same executable and resource requirements, but different input files. Job arrays can be submitted, controlled, and monitored as a single unit or as individual jobs or groups of jobs. Each job submitted from a job array shares the same job ID as the job array and is uniquely referenced using an array index. This approach is useful for ‘high throughput' tasks, for example where you want to run your simulation with different driving data or run the same processing task on multiple data files.

Important note: The maximum job array size that SLURM is configured for is MaxArraySize  = 10000. If a Job array of size is greater than 10000 is submitted, SLURM will reject the job submission with the following error message: "Job array index too large. Job not submitted."

Taking a simple R submission script as an example:

#SBATCH --partition=short-serial 
#SBATCH --job-name=myRtest
#SBATCH -o %j.out 
#SBATCH -e %j.err 
#SBATCH --time=30:00

module add jaspy
Rscript TestRFile.R dataset1.csv

If you want to run the same  script  TestRFile.R with  input  file  dataset2.csv  through   to  dataset10.csv ,  you could create and submit a job script for each dataset.  However, by setting up an array job, you could create and submit a single job script. 

The corresponding job array script to process 10 input files in a single job submission would look something like:

#SBATCH --partition=short-serial 
#SBATCH --job-name=myRarray
#SBATCH -o %A_%a.out
#SBATCH -e %A_%a.err
#SBATCH --time=30:00
#SBATCH --array=1-10
module add jaspy
Rscript TestRFile.R datset${SLURM_ARRAY_TASK_ID}.csv

Here the important differences are :

  • The array is created by SLURM directive --array=1-10 by including elements numbered [1-10]to represent our 10 variations
  • The error and output file have the array  index  %a  included  in the name and "%A" is the job ID.
  • The environment variable $SLURM_ARRAY_TASK_ID in the Rscript command is expanded to give the job index

When the job is submitted, SLURM will create 10 tasks under  the single  job ID.  The job array script is submitted in the usual way:

$ sbatch myRarray.sbatch

If you use  the    squeue -u <username>  command  to list your active jobs, you will see 10 tasks with the same Job ID.  The tasks can be distinguished by  the  [index]   e.g. jobID_index.  Note that individual tasks may be allocated to a range of different hosts on LOTUS.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.