Dask Gateway
Dask Gateway is a service which manages dask clusters for users. On JASMIN, it creates a dask cluster in LOTUS, our batch computing cluster. It automatically creates a dask for you, scheduling Slurm jobs to create dask schedulers and workers as appropriate.
Before using Dask Gateway on JASMIN, you will need:
jasmin-login
access role:
jasmin-login
has been approved and completed), the dask
access role:
The jasmin-login
access role ensures that your account is set up with access to the LOTUS batch processing cluster, while the dask
role grants access to the special LOTUS partition used by the Dask Gateway service.
In the JASMIN notebooks service, authentication to dask-gateway happends automatically. You can use the snippet below to create a cluster and get a dask client which you can use:
import dask_gateway
# Create a connection to dask-gateway.
gw = dask_gateway.Gateway("https://dask-gateway.jasmin.ac.uk", auth="jupyterhub")
# Inspect and change the options if required before creating your cluster.
options = gw.cluster_options()
options.worker_cores = 2
# Create a dask cluster, or, if one already exists, connect to it.
# This stage creates the scheduler job in SLURM, so may take some time.
# While your job queues.
clusters = gw.list_clusters()
if not clusters:
cluster = gw.new_cluster(options, shutdown_on_close=False)
else:
cluster = gw.connect(clusters[0].name)
# Create at least one worker, and allow your cluster to scale to three.
cluster.adapt(minimum=1, maximum=3)
# Get a dask client.
client = cluster.get_client()
#########################
### DO DASK WORK HERE ###
#########################
# When you are done and whish to release your cluster:
cluster.shutdown()
(eg. on the sci
machines)
At the current time, it is still necessary to use the notebooks service to generate an API token to allow you to connect to the gateway server.
mkdir -p ~/.config/dask
touch ~/.config/dask/gateway.yaml
chmod 600 ~/.config/dask/gateway.yaml
Head to https://notebooks.jasmin.ac.uk/hub/token , put a note in the box to remind yourself what this token is for, press the big orange button then copy then token.
Paste the following snippet into ~/.config/dask/gateway.yaml
, replace the part in brackets with the API token you just copied.
gateway:
address: https://dask-gateway.jasmin.ac.uk
auth:
type: jupyterhub
kwargs:
api_token: replaceWithYourSecretAPIToken
Currently the dask dashboard is not accessible from a browser outside the JASMIN firewall. If your browser fails to load the dashboard link returned, please use our graphical desktop service to run a Firefox browser inside the firewall to view your dashboard.
By default the jasmin notebooks service and dask gateway use the latest version of the jaspy
software environment. However, often users would like to use their own software environments.
When dask gateway greates a dask cluster for a user, it runs a setup command to activate a conda environment or python venv
.
To have dask use your packages, you need to create a custom environment which you can pass to dask gateway to activate.
However, for techical reasons, it is not currently possible to use the same virtual environment in both the notebook service and on jasmin. So you will need to make two environments, one for your notebook to use and one for dask to use.
It is VERY important that these environments have the same packages installed in them, and that the packages are exactly the same version in both environments.
If you do not keep packages and versions in-sync you can expect many confusing errors.
If you use a self-containted conda enironment this is not a problem, and you can use this as a kernel in the notebooks service and on the sci
machines. You can skip to
Putting it all together below.
module load jaspy
python -m venv name-of-environment
source name-of-environment/bin/activate
pip install dask-gateway dask lz4
pip install dask-gateway dask lz4
options.worker_setup
to a command which will activate your dask virtual environment. For exampleoptions.worker_setup = "source /home/users/example/name-of-environment/bin/activate"
Examples of code and notebooks which can be used to test the JASMIN dask gateway service are available on GitHub.