Details & registration for the JASMIN User Conference, 1-2 October 2025  Find out more!
Docs

How to monitor Slurm jobs

 

Share via

Getting information about your Slurm jobs

Job information  

Information on all running and pending batch jobs managed by Slurm can be obtained from the Slurm command squeue. Note that information on completed jobs is only retained for a limited period. Information on jobs that ran in the past is via sacct. A (simplified) example of the output squeue is shown below.

squeue -u fred
JOBID PARTITION       QOS   NAME   USER  ST   TIME  NODES NODELIST(REASON)
18957  standard  standard   mean   fred   R   0:01      1 host147
18967     debug  standard   wrap   fred   R  14:25      1 host146

The ST field is the job state and the TIME is the time used by the job. You may also see TIME_LEFT, CPUS (number of CPUs for the job), PRIORITY, and NODELIST(REASON) which shows which hosts the job is running on and why the job is in the current state.

The -u fred argument restricts the squeue output about user fred. Alternatively, use squeue --me which means “my own jobs”.

Official documentation for the squeue command is available here  .

A batch job evolves in several states in the course of its execution. The typical job states are defined below:

Symbol Job state Description
PD Pending The job is waiting in a queue for allocation of resources
R Running The job currently is allocated to a node and is running
CG Completing The job is finishing but some processes are still active
CD Completed The job has completed successfully
F Failed Failed with non-zero exit value
TO Terminated Job terminated by Slurm after reaching its runtime limit
S Suspended A running job has been stopped with its resources released to other jobs
ST Stopped A running job has been stopped with its resources retained

Slurm commands for monitoring jobs  

A list of the most commonly used commands and their options for monitoring batch jobs are listed below:

Slurm Command Description
squeue To view information for all jobs running and pending on the cluster
squeue --user=username Displays running and pending jobs per individual user
squeue --me Displays running and pending jobs for the current user
squeue --states=PD Displays information for pending jobs (PD state) and their reasons
squeues --states=all Shows a summary of the number of jobs in different states
scontrol show job JOBID Shows detailed information about your job (JOBID = job number) by searching the current event log file
sacct -b Shows a brief listing of past jobs
sacct -l -j JOBID Shows detailed historical job information of a past job with jobID

Inspection of a job record  

An example of the job record from a simple job submitted to Slurm:

sbatch -A mygws -q debug -p debug --wrap="sleep 2m"
Submitted batch job 18973

Then we can take the job ID from Slurm for the next command:

scontrol show job 18973
JobId=18973 JobName=wrap
   UserId=fred(26458) GroupId=users(26030) MCS_label=N/A
   Priority=1 Nice=0 Account=jasmin QOS=standard
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:08 TimeLimit=01:00:00 TimeMin=N/A
  SubmitTime=2020-05-20T14:10:28 EligibleTime=2020-05-20T14:10:28
   AccrueTime=2020-05-20T14:10:28
   StartTime=2020-05-20T14:10:32 EndTime=2020-05-20T15:10:32 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-05-20T14:10:32
   Partition=test AllocNode:Sid=sci2-test:18286
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=host147
   BatchHost=host147
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=128890M,node=1,billing=1
   Socks/Node=*NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=128890M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/home/users/fred
   StdErr=/home/users/fred/slurm-18973.out
   StdIn=/dev/null
   StdOut=/home/users/fred/slurm-18973.out
   Power=

History of jobs  

sacct
        JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
18963              wrap par-single     jasmin          1  COMPLETED      0:0
18964              wrap short-ser+     jasmin          1  COMPLETED      0:0
18965              wrap par-single     jasmin          1  COMPLETED      0:0
18966              wrap short-ser+     jasmin          1  COMPLETED      0:0
Last updated on 2025-08-19 as part of:  prototype of v1.10.0 (c86b5ad18)
Follow us

Social media & development