Orchid GPU cluster

Share via

Link copied to clipboard

Details of JASMIN's GPU cluster, ORCHID

Not yet reviewed for compatibility with the new cluster, April 2025. Please adapt using new submission instructions. This alert will be removed once updated.

GPU cluster spec

The JASMIN GPU cluster is composed of 16 GPU nodes:

14 x standard GPU nodes with 4 GPU Nvidia A100 GPU cards each
2 x large GPU nodes with 8 Nvidia A100 GPU cards

Request access to ORCHID

Before using ORCHID on JASMIN, you will need:

An existing JASMIN account and valid jasmin-login access role: Apply here
Subsequently (once jasmin-login has been approved and completed), the orchid access role: Apply here

The jasmin-login access role ensures that your account is set up with access to the LOTUS batch processing cluster, while the orchid role grants access to the special LOTUS partition used by ORCHID.

Holding the orchid role also gives access to the GPU interactive node.

Note: In the supporting info on the orchid request form, please provide details on the software and the workflow that you will use/run on ORCHID.

Test a GPU job

Testing a job on the JASMIN ORCHID GPU cluster can be carried out in an interactive mode by launching a pseudo-shell terminal Slurm job from a JASMIN scientific server e.g. sci-vm-01:

srun --gres=gpu:1 --partition=orchid --account=orchid --qos=orchid --pty /bin/bash
srun: job 19505658 queued and waiting for resources
srun: job 19505658 has been allocated resources

At this point, your shell prompt will change to the GPU node gpuhost004, but with access to one GPU as shown by the NVDIA utility. You will have the one GPU allocated at this shell, as requested:

nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  |   00000000:01:00.0 Off |                    0 |
|                   ...                   |           ...          |         ...          |

Note that for batch mode, a GPU job is submitted using the Slurm command sbatch:

sbatch --gres=gpu:1 --partition=orchid --account=orchid --qos=orchid gpujobscript.sbatch

or by adding the following preamble in the job script file

#SBATCH --partition=orchid
#SBATCH --account=orchid
#SBATCH --qos=gpu:1
#SBATCH --gres=gpu:1

Notes:

gpuhost015 and gpuhost016 are the two largest nodes with 64 CPUs and 8 GPUs each.
IMPORTANT CUDA Version: 12.8 Please add the following to your path

export PATH=/usr/local/cuda-12.8/bin${PATH:+:${PATH}}

The Slurm batch partition orchid has a maximum runtime of 24 hours and the default runtime is 1 hour. The maximum number of CPU cores per user is limited to 8 cores. If the limit is exceeded then the job is expected to be in a pending state with the reason being QOSGrpCpuLimit

GPU interactive node outside Slurm

There is an interactive GPU node gpuhost001.jc.rl.ac.uk, not managed by Slurm, which has the same spec as other ORCHID nodes. You can access it directly from the JASMIN login servers for prototyping and testing code prior to running as a batch job on ORCHID:

Make sure that your initial SSH connection to the login server used the -A (agent forwarding) option, then:

ssh gpuhost001.jc.rl.ac.uk

# now on gpu interactive node

Software Installed on the GPU cluster

CUDA version 12.8
CUDA DNN (Deep Neural Network Library) version cudnn9-cuda-12
cuda-toolkit - version 12.8
Singularity version 4.2.2-1 checked version - supports NVIDIA/GPU containers
podman version 5.2.2
SCL Python 3.6

Last updated on 2025-06-17 as part of: minor changes from review (29f710068)

On this page:

Docs

GPU cluster spec

Request access to ORCHID

Test a GPU job

GPU interactive node outside Slurm

Software Installed on the GPU cluster