Docs
The JASMIN Notebooks Service with GPUs enabled
JASMIN Notebooks Service with GPUs enabled
The JASMIN Notebook Service has recently been updated to include a GPU-enabled node. This means that JASMIN users can now run Machine Learning (ML) workflows in Notebooks. This page outlines:
Who can access the GPU-enabled Notebooks Service?
The service is available to all JASMIN users that have been granted access to the ORCHID (GPU) cluster. Existing JASMIN users can apply here .
Starting a Notebook Server with GPUs
In order to start a Notebook Server with GPUs enabled, go to the initial start page and click on the “Launch Server” button:

Then select the “GPU” option and click “Start”:

Which packages are available by default?
Check the top-right corner of a Notebook session to see which kernel that is being used.
If you don’t need any specialist Machine Learning (ML) libraries, you would typically
choose Python 3 + Jaspy
as this has many of the common open-source packages used within
environmental science:

You can click on the name of the kernel to select a different one.
If you want to work with GPUs, you are likely to want to install other packages that
are common in ML, such as PyTorch
and TensorFlow
. This topic is discussed below.
GPU availability
In order to check that your notebook is running on a server with GPUs, you can use the built-in NVIDIA commands, such as:
!nvidia-smi
If GPUs are enabled, the output should look like this:

Understanding the nvidia-smi
command output
1. The Header section
The first section includes:
- CUDA Version: 12.7: The version of the CUDA toolkit that the NVIDIA driver supports.
- GPU 0 / GPU 1: There are two physical NVIDIA A100 GPUs in the system.
- Name: The model is
NVIDIA A100-SXM4-40GB
. Each GPU has 40GB of on-board memory. - Memory-Usage: Shows
N/A
because these GPUs are in MIG mode (Multi-Instance GPU), so memory usage is not reported here in the usual way. Memory usage for MIG slices is shown in the dedicated MIG section (below). - GPU-Util: Also
N/A
for the same reason (MIG is active, so usage must be looked at per MIG instance).
2. The MIG section
The second section introduces MIG (Multi-Instance GPU) . When a GPU is running in MIG Mode, it allows each GPU to be partitioned into multiple instances, each acting as a smaller independent, or virtual, GPU. Because MIG is turned on, you see “N/A” in the normal memory usage fields. Instead, you have a dedicated table for each MIG device:
- GPU: This repeats the GPU ID (0 or 1).
- GI ID (GPU Instance) and CI ID (Compute Instance): Each MIG slice is defined by a GPU instance and a compute instance.
- MIG Dev: The MIG device index.
- Memory-Usage (13MiB / 9984MiB): Each MIG slice here is allocated around 10GB (
9984MiB
) of GPU memory. Currently, only 13 MiB is being used, likely overhead. - BAR1-Usage: This is the amount of memory mapped via the BAR1 aperture (used for buffer transfers).
- CE / ENC / DEC / OFA / JPG: These columns refer to hardware encoder/decoder and other specialized engines available to each MIG slice.
3. The Processes section
The third section, processes, indicates what is running on the GPU/MIG instances:
- No running processes found: There were no active workloads on the GPUs or MIG instances at the time this command was run.
In short: There are two physical A100 GPUs. Each is in MIG mode and is presenting one virtual GPU instance with 10GB of memory. Currently, neither GPU has any running processes, so they’re essentially idle. The top-level memory usage fields are “N/A” because MIG splits the GPU resources, and the usage is shown in the MIG devices table below.
Getting the GPU and MIG device IDs
The following command will give you the exact IDs of the available GPUs and MIG instances:
!nvidia-smi -L
The output will be something like:
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-2927d07e-3fe9-7904-9e08-b08b82d9a37d)
MIG 1g.10gb Device 0: (UUID: MIG-6e95ef19-5145-571b-b040-7e731f1c1af3)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-e109d8d9-923e-7235-0429-96b7fdbcbd30)
MIG 1g.10gb Device 0: (UUID: MIG-b4bcd4f3-6f69-516d-9404-b5ada80d760b)
Resource allocation
The current allocation of GPUs to the JASMIN Notebook Service is as follows:
- 1 GPU Node serves 4 physical GPUs (
NVIDIA A100-SXM4-40GB
). - Each GPU is partitioned, using MIG, into 4 virtual GPU instances.
- Each user is allocated 2 virtual GPU instances for their own notebook instance.
- Each virtual GPU instance has 10GiB of memory.
Software environments and Machine Learning packages
In the current release of the Notebook Service, users are required to install their own ML packages for use with GPUs. We recommend this approach:
- Create a virtual environment ("venv"), for example
ml-venv
. Use our guide to help you. - Install the packages you require into that venv. For example, if you needed
pytorch
andtorchvision
, you would runpip install torch torchvision
(including specific versions if needed). NOTE: Many ML packages are very big - this can take several minutes. - Be sure to follow the instructions for installing
ipykernel
into your venv and running the relevant command to install the kernel so that JupyterHub can locate it and list it as one of the available kernels. Use the name of your venv as the name of the kernel. - Once you have installed your kernel, it should appear as an option in the Launcher as outlined in green in the diagram below. The Launcher is accessible from the File menu.
Specific advice on installing TensorFlow for use with GPU Notebooks
The general-purpose Python 3 + Jaspy
kernel already includes a version of TensorFlow,
but it was compiled against CPU-only hardware.
In order to use TensorFlow with GPUs in a Notebook, there are two approaches you can take:
- Create your own virtual environment (as mentioned above) and install TensorFlow.
- Simply install TensorFlow within your Notebook using
pip
.
Option 1 is more complicated but it allows you to manage multiple separate software environments which you can switch between.
For Option 2, you will be installing the package into your $HOME
directory in a location
such as: ${HOME}/.local/lib/python3.11/site-packages/
. Note that the exact Python version
may vary. You can install TensorFlow from a Notebook by typing the following into a cell a
executing it:
!pip install tensorflow[and-cuda] keras
NOTE: Make sure you previously selected the GPU option when you launched your Notebook server (as instructed above).
You will need to restart your Notebook kernel before the newly installed version of TensorFlow can be imported. See below for instructions on importing and checking that the GPUs are visible to your Notebook session.
Specific advice on installing PyTorch for use with GPU Notebooks
Unlike TensorFlow, PyTorch is not installed within the Python 3 + Jaspy
kernel so
you will need to install it yourself.
In order to use PyTorch with GPUs in a Notebook, there are two approaches you can take:
- Create your own virtual environment (as mentioned above) and install PyTorch.
- Simply install PyTorch within your Notebook using
pip
.
Option 1 is more complicated but it allows you to manage multiple separate software environments which you can switch between.
For Option 2, you will be installing the package into your $HOME
directory in a location
such as: ${HOME}/.local/lib/python3.11/site-packages/
. Note that the exact Python version
may vary. You can install PyTorch from a Notebook by typing the following into a cell a
executing it:
!pip install torch
NOTE: Make sure you previously selected the GPU option when you launched your Notebook server (as instructed above).
You will need to restart your Notebook kernel before the newly installed version of PyTorch can be imported. See below for instructions on importing and checking that the GPUs are visible to your Notebook session.
Handling multiple/conflicting versions of software packages
It is common to find that different workflows will require different versions of software packages. In the fast-moving world of ML, the libraries and their dependencies often change and this can cause problems when trying to work within a single software environment.
If you encounter this kind of problem, we recommend that you create multiple
virtual environments and their associated kernels. You can then select the
appropriate kernel for each notebook. It may also be worth investing the time
in capturing exact versions of the relevant packages so that you can reproduce
your environment if necessary. Python packages often use a requirements file
(typically named requirements.txt
) to capture the dependencies. For example:
scikit-learn==1.5.1
torch==2.5.1+cu124
torchvision==0.20.1+cu124
All packages listed in a requirements file can be installed with a single command:
$ pip install -r requirements.txt
Importing PyTorch or TensorFlow and testing that they work with CUDA
CUDA is system that connects the Python libraries to the GPU system (on NVIDIA hardware).
When we install PyTorch, or many other ML packages, it should automatically detect CUDA
if it is available. Assuming that you have followed the instructions to create a venv
and install PyTorch
, then you can check for CUDA with:
>>> import torch
>>> print("Is CUDA available? ", torch.cuda.is_available())
Is CUDA available? True
The same thing is possible with TensorFlow
:
>>> import tensorflow as tf
>>> print(tf.config.list_physical_devices('GPU'))
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
Warning about large ML packages and HOME directory disk quota
Please be aware that installing these packages into your $HOME
directory will
require multiple gigabytes of free space. If you are near your quota (100GB),
then the installation may fail. It is important to note that an
installation failure may not report a violation of disk quota even if
that is the underlying problem.
See the
HOME
directory documentation
for details on checking your current disk usage.
Guidelines and Best Practices
Efficient GPU usage
Please make use of GPUs efficiently in your code. If you only need CPUs, then please use the standard Notebook service. One way to ensure that the resource is being efficiently used is to stop your notebook server, via the Hub Control Panel (see the File menu) when not actively needed. Be sure to save your notebook before stopping the server.
Memory and Resource Limits
The per-user memory limit for a given notebook is given in the bar below (typically 16GB). On the GPU architecture there is 10GiB per virtual GPU.
Scaling up your workflows
Experienced JASMIN users will be familiar with the resource limitations of the Notebook Service. Whilst it is great for prototyping, scientific notebooks and code-sharing, it does not suit large multi-process and long-running workflows. The LOTUS cluster is provided for larger workflows, and it includes the ORCHID partition for GPU usage.
We recommend that you use the GPU-enabled Notebook Service to develop and prototype your ML workflows, and migrate them to ORCHID if they require significantly more compute power. Please contact the JASMIN Helpdesk if you would like advice on how to migrate your workflows.
Getting Advice on Machine Learning
For advice on machine learning packages and suitability for different applications, you could make use of the NERC Earth Observation Data Analysis and AI service (NEODAAS). See the NEODAAS website for details.
A Notebook to get started
An introductory notebook, which includes most of the information provided on this page, is available on GitHub . It may provide a useful starting point for your journey.