SLURM queues
This article introduces the SLURM scheduler queues/partitions for batch job submissions to LOTUS. It covers:
- Queue name
- Queue details
- How to view the state of SLURM queues
- How to choose a queue
Queue name
LOTUS cluster is managed by SLURM and the SLURM queues are:
test
short-serial
long-serial
par-single
par-multi
high-mem
short-serial-4hr
(see Note 3)
Each queue has an attribute of run-length limits (e.g. short, long) and resources. A full breakdown of each queue and its associated resources is shown below in Table 1.
Queue details
Queues represent a set of pending jobs, lined up in a defined order, and waiting for their opportunity to use resources. The queue is specified in the job script file using SLURM scheduler directive #SBATCH -p <partition=queue_name>
where <queue_name>
is the name of the queue/partition (Table 1. column 1)
Table 1 summarises important specifications for each queue such as run time limits and the number of CPU core limits. If the queue is not specified, SLURM will schedule the job to the queue short-serial
by default.
Table 1. LOTUS/SLURM queues and their specifications
Queue name | Max run time | Default run time | Max CPU cores per job |
Max CpuPer UserLimit |
Priority |
test |
4 hrs | 1hr | 8 | 8 | 30 |
short-serial |
24 hrs | 1hr | 1 | 2000 | 30 |
par-single |
48 hrs | 1hr | 16 | 300 | 25 |
par-multi |
48 hrs | 1hr | 256 | 300 | 20 |
long-serial |
168 hrs | 1hr | 1 | 300 | 10 |
high-mem |
48 hrs | 1hr | 1 | 75 | 30 |
short-serial-4hr (Note 3) |
4 hrs | 1hr | 1 | 1000 | 30 |
Note 1: Resources that the job requests must be within the resource allocation limits of the selected queue.
Note 2: The default value for --time=[hh:mm:ss]
(predicted maximum wall time) is 1 hour for the six SLURM queues. If you do not specify this option and/or your job exceeds the default maximum run time limit then it will be terminated by the SLURM scheduler.
Note 3: A user must specify the SLURM job account --account=short4hr
when submitting a batch job to the provisional SLURM partition short-serial-4hr
How to view the state of SLURM queues
The SLURM command sinfo
reports the state of queues/partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up 4:00:00 48 idle host[146-193] short-serial* up 1-00:00:00 48 idle host[146-193] long-serial up 7-00:00:00 48 idle host[146-193] par-single up 2-00:00:00 48 idle host[146-193] par-multi up 2-00:00:00 48 idle host[146-193] high-mem up 2-00:00:00 48 idle host[146-193] lotus_gpu up 7-00:00:00 48 idle host[146-193] copy up 7-00:00:00 48 idle host[146-193] cpom-comet up 7-00:00:00 48 idle host[146-193] ...
Note: Queues other than the standard queuestest
short-serial
long-serial
par-single
par-multi
high-mem
'sinfo' Output field description:
By default, the SLURM command 'sinfo' displays the following information:
- PARTITION: Partition name followed by "*" for the default queue/partition
- AVAIL: State/availability of a queue/partition. Partition state: up or down.
- TIMELIMIT: The maximum run time limit per job in each queue/partition is shown in TIMELIMIT in days- hours:minutes
:seconds . e.g. 2-00:00:00 is two days maximum runtime limit - NODES: Count of nodes with this particular configuration e.g. 48 nodes
- STATE: State of the nodes. Possible states include: allocated, down, drained, and idle. For example, the state "idle" means that the node is not allocated to any jobs and is available for use.
- NODELIST List of node names associated with this queue/partition
The 'sinfo' example below, reports more complete information about the partition/queue short-serial
$ sinfo --long --partition=short-serial Tue May 12 18:04:54 2020 PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST short-serial* up 1-00:00:00 1-infinite no NO all 48 idle host[146-193]
How to choose a SLURM queue/partition
Test queue
The test
test
can be used to test new workflows and also to help new users to familiarise themselves with the SLURM batch system. Both serial and parallel code can be tested on the test
queue. The maximum runtime is 4 hrs and the maximum number of jobs per user is 8 job slots. The maximum number of cores for a parallel job e.g. MPI, OpenMP, or multi-threads is limited to 8 cores. The test
queue should be used when unsure of the job resource requirements and behavior at runtime because it has a confined set of LOTUS nodes (Intel node type) not shared with the other standard LOTUS queues.
Serial queues
Serial and array jobs with a single CPU core should be submitted to one of the following serial queues depending on the job duration and the memory requirement. The default queue is short-serial
short-serial
Serial and/or array jobs with a single CPU core each and run time less than 24 hrs should be submitted to the short-serial
short-serial
long-serial
Serial or array jobs with a single CPU core and run time greater than 24 hrs and less than 168 hrs (7 days) should be submitted to the queue long-serial
high-mem
Serial or array jobs with a single CPU core and high memory requirement (> 64 GB) should be submitted to the high-mem
--mem=XXX
(XXX is in MB units). The job should not exceed the maximum run time limit of 48hrs. This queue is not configured to accept exclusive jobs.
Parallel queues
Jobs requiring more than one CPU core should be submitted to one of the following parallel queues depending on the type of parallelisms such as shared memory or distributed memory jobs.
par-single
Shared memory multi-threaded jobs with
par-single
sbatch -n <number of CPU cores>
#SBATCH -n <number of CPU cores>
in the job script file. An example is shown below:
$ sbatch --ntasks=4 --partition=par-single < myjobscript
Note: Jobs submitted with a number of CPU cores greater than 16 will be terminated (killed) by the SLURM scheduler with the following statement in the job output file:
par-multi
Distributed memory jobs with inter-node communication using the MPI library should be submitted to
par-multi
sbatch -n <number of CPU cores>
#SBATCH -n <number of CPU cores>
$ sbatch --ntasks=4 --partition=par-multi < myjobscript<br>
Note 1: The number of CPU cores gets passed from SLURM submission
-n
-np
mpirun
Note 2: SLURM will reject a job that requires a number of CPU cores greater than the limit of 256.