Orchid GPU cluster
Orchid GPU cluster
This article provides details on JASMIN’s GPU cluster, named ORCHID.
The JASMIN GPU cluster is composed of 16 GPU nodes:
Before using ORCHID on JASMIN, you will need:
jasmin-login
access role:
Apply here
jasmin-login
has been approved and completed), the orchid
access role:
Apply here
The jasmin-login
access role ensures that your account is set up with access to the LOTUS batch processing cluster, while the orchid
role grants access to the special LOTUS partition used by ORCHID.
Holding the orchid
role also gives access to the GPU interactive node.
Note: In the supporting info on the orchid
request form, please provide details
on the software and the workflow that you will use/run on ORCHID.
Testing a job on the JASMIN ORCHID GPU cluster can be carried out in an
interactive mode by launching a pseudo-shell terminal Slurm job from a JASMIN
scientific server e.g. sci-vm-01
:
srun --gres=gpu:1 --partition=orchid --account=orchid --pty /bin/bash
srun: job 24096593 queued and waiting for resources
srun: job 24096593 has been allocated resources
# you are now on gpuhost16
The GPU node gpuhost016 is allocated for this interactive session on LOTUS
Note that for batch mode, a GPU job is submitted using the Slurm command ‘sbatch’:
sbatch --gres=gpu:1 --partition=orchid --account=orchid gpujobscript.sbatch
or by adding the following preamble in the job script file
#SBATCH --partition=orchid
#SBATCH --account=orchid
#SBATCH --gres=gpu:1
Note 1: gpuhost015
and gpuhost016
are the two largest nodes with 64 CPUs and
8 GPUs.
Note 2: CUDA Version: 11.6
Note 3: The Slurm batch partition/queue orchid
has a maximum runtime of 24 hours and
the default runtime is 1 hour. The maximum number of CPU cores per user is
limited to 8 cores. If the limit is exceeded then the job is expected to be in
a pending state with the reason being
QOSGrpCpuLimit
There is an interactive GPU node gpuhost001.jc.rl.ac.uk
, with the same spec as
other Orchid nodes, that you can access via a login server to prototype and
test your GPU code prior to running as a batch job.
ssh -A gpuhost001.jc.rl.ac.uk
# you are now on gpuhost001