JASMIN Help Site logo JASMIN Help Site logo
  • Docs 
  • Guides 
  • Training 
  • Discussions   

  •   Search this site  

Can't find what you're looking for?

Try our Google custom search, across all JASMIN sites

Docs
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  1.   For Cloud Tenants
  1. Home
  2. Docs
  3. For Cloud Tenants
  4. Cluster-as-a-Service - Slurm

Cluster-as-a-Service - Slurm

 

Share via
JASMIN Help Site
Link copied to clipboard

Cluster-as-a-Service - Slurm

On this page
Introduction   Cluster configuration   Accessing the cluster  

This article describes how to deploy and use a Slurm cluster using JASMIN Cluster-as-a-Service (CaaS).

 
CaaS Slurm clusters are currently disabled because of a security problem with the images that were being used. We are working on a new system which will provide slurm clusters.

Introduction  

The Slurm Workload Manager  is a popular open- source job scheduler. It provides facilities for executing and monitoring workloads across a set of nodes and managing contention for those nodes by maintaining a queue of pending jobs.

Slurm is a powerful scheduling system, and a full discussion of the available commands and options is beyond the discussion of this article - please consult the Slurm documentation. This article focuses on the specifics of how to deploy and access a Slurm cluster in CaaS.

In CaaS, a Slurm cluster consists of a single login node and several worker nodes. The Linux users and groups on the cluster are managed by the Identity Manager for the tenancy, meaning that SSH access to the nodes can be controlled using FreeIPA groups. User home directories are mounted on all nodes using a shared storage cluster. Slurm is configured with a single queue, to which all the compute hosts are added.

The login node can optionally be assigned an external IP, however external IPs are a scarce resource in the JASMIN Cloud - if you want to preserve your external IPs for other clusters, you can use the Identity Manager gateway host as a jump host.

Cluster configuration  

The following variables are available to configure a Slurm cluster:

Variable Description Required? Can be updated?
Identity manager The CaaS Identity Manager that is used to control access to the cluster. Yes No
Shared storage The shared storage cluster to use for user home directories. Yes No
Worker nodes The number of worker nodes in the cluster. This can be scaled up or down after deployment. When scaling down, there is currently no effort made to drain the hosts in order to remove them gracefully: jobs executing on the removed hosts will fail. This may change in the future. Yes Yes
Login node size The size to use for the login node. Yes No
Compute node size The size to use for the compute nodes. Yes No
External IP The external IP to attach to the login node. This is optional - if not given, the cluster can still be accessed by using the Identity Manager’s gateway host as a jump host for SSH. No No

Accessing the cluster  

The Slurm hosts are configured to use the users and groups from FreeIPA using SSSD  . They are also configured to use SSH keys from FreeIPA for SSH authentication (password-based SSH is disabled).

For every Slurm cluster that is deployed, CaaS automatically creates a group in FreeIPA called <clustername>_users. This group, along with the admins group, are permitted SSH access to the hosts in the cluster. To permit a user SSH access to a Slurm cluster, they just need to be added to one of these groups (depending on whether you also want them to be an admin on other clusters).

Once they have been added to one of these groups, the Slurm cluster can be accessed via SSH. The following is an example of accessing a Slurm cluster without an external IP using the Identity Manager’s gateway as a jump host:

# Add SSH key to the session
ssh-add /path/to/ssh/key

# SSH to the identity manager gateway with agent forwarding enabled
ssh -A jbloggs@192.171.139.83
# SSH to the Slurm login node
ssh 192.168.3.16
# Check that we are in our home directory
pwd
/home/users/jbloggs

# Check the Slurm status
sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute*     up 1-00:00:00      3   idle slurm-compute-[0-2]

# Run a simple job
srun -N3 -l /bin/hostname
0: slurm-compute-0.novalocal
1: slurm-compute-1.novalocal
2: slurm-compute-2.novalocal

A more in-depth discussion of the capabilities of Slurm is beyond the scope of this document - please refer to the Slurm documentation.

Last updated on 2024-09-05 as part of:  replacing refs using old syntax & tidied some other links (f03769a9c)
On this page:
Introduction   Cluster configuration   Accessing the cluster  
Follow us

Social media & development

   

Useful links

  • CEDA Archive 
  • CEDA Catalogue 
  • JASMIN 
  • JASMIN Accounts Portal 
  • JASMIN Projects Portal 
  • JASMIN Cloud Portal 
  • JASMIN Notebooks Service 
  • JASMIN Community Discussions 

Contact us

  • Helpdesk
UKRI/STFC logo
UKRI/NERC logo
NCAS logo
NCEO logo
Accessibility | Terms and Conditions | Privacy and Cookies
Copyright © 2025 Science and Technology Facilities Council.
Hinode theme for Hugo licensed under Creative Commons (CC BY-NC-SA 4.0).
JASMIN Help Site
Code copied to clipboard