SLURM queues

This article introduces the SLURM queues batch job submissions to LOTUS. It covers:

  • Queue name
  • Queue details
  • How to view the state of SLURM queues

Queue name 

The CentOS7 sub-cluster LOTUS is managed by SLURM and the SLURM queues are:

  • test
  • short-serial
  • long-serial
  • par-single
  • par-multi
  • high-mem

Each queue has an attribute of run-length limits (e.g. short, long) and resources. A full breakdown of each queue and its associated resources is shown below in Table 1. 

Queue details

Queues represent a set of pending jobs, lined up in a defined order, and waiting for their opportunity to use resources. The queue is specified in the job script file using SLURM scheduler directive   #SBATCH -p <partition=queue_name> where <queue_name> is the name of the queue/partition (Table 1. column 1)

Table 1 summarises important specifications for each queue such as run time limits and the number of CPU core limits. If the queue is not selected, SLURM will schedule the job to the queue  short-serial by default.  

Table 1. LOTUS/SLURM queues and their specifications 

Queue name   Max run time  Default run time  Max cores per job  Max cores per user  Priority 
test    4 hrs  1hr  30 
short-serial   24 hrs  1hr  1   2000  30 
par-single 48 hrs  1hr  16  256  25 
par-multi 48 hrs  1hr  256  256  20 
long-serial  168 hrs  1hr  256  10 
high-mem 48 hrs  1hr  48  30 

Note 1: Resources that the job requests must be within the resource allocation limits of the selected queue. 

Note 2: The default value for --time=[hh:mm:ss] (predicted maximum wall time) is 1 hour for the six SLURM queues. If you do not specify this option and/or your job exceeds the maximum run time limit then it will be terminated by the SLURM scheduler. 

How to view the state of SLURM queues

The SLURM command sinfo reports the state of queues/partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.

$ sinfo   
PARTITION     AVAIL  TIMELIMIT  NODES  STATE NODELIST
test             up    4:00:00     48   idle host[146-193]
short-serial*    up 1-00:00:00     48   idle host[146-193]
long-serial      up 7-00:00:00     48   idle host[146-193]
par-single       up 2-00:00:00     48   idle host[146-193]
par-multi        up 2-00:00:00     48   idle host[146-193]
high-mem         up 2-00:00:00     48   idle host[146-193]
lotus_gpu        up 7-00:00:00     48   idle host[146-193]
copy             up 7-00:00:00     48   idle host[146-193]
cpom-comet       up 7-00:00:00     48   idle host[146-193]
...

Note: Queues other than the standard queuestest short-serial long-serial ,   par-single ,  par-multi  and  high-mem  should  be ignored as they implement different job scheduling and control policies. 

'sinfo' Output field description:

By default, the SLURM command 'sinfo' displays the following information:

  • PARTITION:  Partition name followed by "*" for the default queue/partition
  • AVAIL:  State/availability of a queue/partition. Partition state: up or down.
  • TIMELIMIT: The maximum run time limit per job in each queue/partition is shown in TIMELIMIT in days-    hours:minutes :seconds . e.g. 2-00:00:00 is two days maximum runtime limit 
  • NODES: Count of nodes with this particular configuration e.g. 48 nodes
  • STATE: State of the nodes. Possible states include: allocated, down, drained, and idle. For example: the state "idle" means that the node is not allocated to any jobs and is available for use.
  • NODELIST List of node names associated with this queue/partition

Example: Report more complete information about the partition/queue short-serial

$ sinfo --long --partition=short-serial
Tue May 12 18:04:54 2020
PARTITION    AVAIL TIMELIMIT JOB_SIZE  ROOT  OVERSUBS  GROUPS NODES    STATE NODELIST
short-serial* up  1-00:00:00  1-infinite  no  NO    all     48  idle host[146-193]