Docs
Migration to Rocky Linux 9 2024
Software and operating system changes - migration to Rocky Linux 9 (Summer 2024)
Introduction
As with a previous migration completed in 2020, the change of operating system version is needed to make sure that the version in use is current and fully supported, i.e. that package updates are available and important security updates can be obtained and applied to keep the platform secure.
The current operating system, CentOS7 is officially end-of-life as of the end of June 2024. We will be moving from CentOS7 to Rocky Linux 9, which is supported until May 2032. Rocky 9 should provide a very similar user experience to that provided by CentOS7, but with more recent software packages. Some software may have been removed or replaced during this transition.
This change affects JASMIN and CEDA services in several ways, including but not limited to the following:
- Components of all CEDA Archive and JASMIN web-based services need to be redeployed
- User-facing service hosts (e.g.
login
/sci
/xfer
and LOTUS nodes) all need to be redeployed - All of these hosts need appropriate versions of drivers for various hardware and infrastructure components (e.g. storage, network, …) to be configured.
- The Slurm scheduler used for the LOTUS and ORCHID clusters needs to be adapted to work under Rocky 9, in terms of its own management functions and the worker nodes which it controls. A separate announcement will cover the expansion of LOTUS with new processing nodes: these will be introduced as a new cluster under Slurm, with existing nodes moved from old to new as part of the transition. There will be a limited window in which the 2 clusters will co-exist, during which time the old cluster will shrink in size: the current estimate for this is between July to September 2024, but we will provide updates on this as the new hardware is installed and timescales become clearer. We will endeavour to provide sufficient overlap and temporary arrangements to help users to migrate their workflows.
- Software made available centrally via the
module
system and under/apps
needs to be made available in versions compatible with Rocky 9. Some software may need to be recompiled. - Other software (e.g. run by users or groups, without being centrally managed) may need to be tested and in some cases recompiled in order to work correctly under Rocky 9.
- Management and monitoring systems need to be updated to operate in the new environment
- For tenants of the JASMIN Cloud, you should already be aware of our plans to move to use the STFC Cloud as the base platform for the JASMIN Cloud Service. Images are currently in preparation so that new (empty) tenancies will soon be available for tenants to manage the migration of their own virtual machines over to new instances using Rocky 9 images. It is anticipated at this stage that managed tenancies (with tenancy sci machines) will be discontinued as part of this move, so users of those VMs will be advised to use the new Rocky 9 general-use sci servers instead.
Much of this work is already underway by teams in CEDA and STFC’s Scientific Computing Department. As a result of extensive work by these teams in recent years to improve the way services are deployed and managed, we are now in a much better position to undertake this kind of migration with as little disruption to users as possible. Some disruption and adaptation by users will be inevitable, however.
Some services have already been migrated and are already running under Rocky 9, but there is still much work to be done over the coming weeks so please watch this space as we do our best to keep you informed of the progress we’re making, and of any actions you may need to take to minimise disruption to your work on JASMIN.
Details of the new Rocky Linux 9 environment
General
The move to Rocky Linux 9 (abbreviated to “Rocky 9” or “R9” from here on) involves many changes at lower levels transparent to users, so we will focus here on those most relevant to how services on JASMIN are accessed and used. The reasons for the choice of Rocky 9 itself, and for some of the associated changes to software, machines and services provided, will not be covered in detail, but have been influenced by a number of factors including:
- organisational security and maintenance policies
- availability of packages and dependencies for the chosen operating system
- user feedback
Login nodes
The list of new login nodes is as follows:
name | status |
---|---|
login-01.jasmin.ac.uk |
ready to use |
login-02.jasmin.ac.uk |
ready to use |
login-03.jasmin.ac.uk |
ready to use |
login-04.jasmin.ac.uk |
ready to use |
Notes:
- There is no longer any requirement for forward/reverse DNS lookup or any restriction by
institutional domain. You no longer need to register non-
*.ac.uk
domains with the JASMIN team (exception:hpxfer
) - This means all users can access all login servers (previously some users could only use
login2
) - As before, no filesystems other than the home directory are mounted.
- Use only as a “hop” to reach other servers within JASMIN.
- Make sure your SSH client is up to date. Check the version with
ssh -V
. If it’s significantly older thanOpenSSH_8.7p1, OpenSSL 3.0.7
, speak to your local admin team as it may need to be updated before you can connect securely to JASMIN.
NX login nodes
name | status |
---|---|
nx1.jasmin.ac.uk |
Ready for use, update your SSH key |
nx2.jasmin.ac.uk |
Ready for use, update your SSH key |
nx3.jasmin.ac.uk |
Ready for use, update your SSH key |
nx4.jasmin.ac.uk |
Ready for use, update your SSH key |
Notes:
- Updated advice for connection, requires updating your SSH key.
- New nodes have identical configuration so are accessible from all network locations (no further need for some users to use only certain nodes).
- By keeping the host names as short as possible, we mitigate the issue some users (with long
usernames created before the 8-character rule) had with agent forwarding: all should behave
the same as the old
nx4
in this respect. - As before, no filesystems other than the home directory are mounted.
- Use only with the NoMachine Enterprise Client to get a graphical Linux desktop, from where you can
- use the Firefox browser on the linux desktop to access web resources only accessible within JASMIN
- make onward connections to a
sci
server for using graphics-intensive applications
- Make sure you are using the most up-to-date version of the NoMachine Enterprise Client.
sci
servers
We have introduced a new naming convention which helps identify virtual and physical/high-memory sci
servers.
The new list is as follows:
name | status | specs | slurm cluster |
---|---|---|---|
Virtual servers | |||
sci-vm-01.jasmin.ac.uk |
Ready to use | 24 CPU / 64 GB RAM / 80 GB (virtual disk) | new |
sci-vm-02.jasmin.ac.uk |
Ready to use | 24 CPU / 64 GB RAM / 80 GB (virtual disk) | new |
sci-vm-03.jasmin.ac.uk |
Ready to use | 24 CPU / 64 GB RAM / 80 GB (virtual disk) | new |
sci-vm-04.jasmin.ac.uk |
Ready to use | 24 CPU / 64 GB RAM / 80 GB (virtual disk) | new |
sci-vm-05.jasmin.ac.uk |
Ready to use | 24 CPU / 64 GB RAM / 80 GB (virtual disk) | new |
Physical servers | |||
sci-ph-01.jasmin.ac.uk |
Ready to use | 48 CPU AMD EPYC 74F3 / 2 TB RAM / 2 x 446 GB SATA SSD | new |
sci-ph-02.jasmin.ac.uk |
Ready to use | 48 CPU AMD EPYC 74F3 / 2 TB RAM / 2 x 446 GB SATA SSD | new |
sci-ph-03.jasmin.ac.uk |
Ready to use | 192 CPU AMD EPYC 9654 / 1.5 TB RAM / 480 GB SATA SSD + 800 GB NvMe SSD | new |
Notes:
- For users within the STFC network, there is no longer any reverse DNS restriction.
- Replacements for common tools:
lxterminal
has been replaced with xfce-terminal- for a more richly-featured editor or Integrated Development Environment (IDE), users should consider using the remote editing features of VSCode or PyCharm , since these can be installed and customised locally by the user to their taste rather than needing central installation and management on JASMIN. See access from VSCode.
- See jaspy, jasr and jasmin-sci sections below for further information on software.
- For graphical applications, use the
NoMachine NX service rather than
sending X11 graphics over the network back to your laptop/desktop, to ensure performance.
- X11 graphics functionality is still to be added to these machines (coming shortly), but currently this will fail with an error like:
xterm: Xt error: Can't open display: xterm: DISPLAY is not set
- X11 graphics functionality is still to be added to these machines (coming shortly), but currently this will fail with an error like:
- As before, physical servers are actually re-configured nodes within the LOTUS cluster and as such have different a network
configuration from the virtual
sci
servers, with limited outward connectivity.
xfer
servers
name | status | notes |
---|---|---|
xfer-vm-01.jasmin.ac.uk |
ready to use | Virtual server |
xfer-vm-02.jasmin.ac.uk |
ready to use | Virtual server |
xfer-vm-03.jasmin.ac.uk |
ready to use | Virtual server, has cron . |
Notes:
- Similar config on all 3 (no domain or reverse DNS restrictions now)
- Same applies re. SSH client version, see login nodes
- If using cron on
xfer-vm-03
, you must use crontamer - Throttle any automated transfers to avoid many SSH connections in quick succession, otherwise you may get blocked.
- Consider using Globus for best performance & reliability for transfers in or out of JASMIN
- A new software collection
jasmin-xfer
has now been added to these servers, providing these tools:
emacs-nox
ftp
lftp
parallel
python3-requests
python3.11
python3.11-requests
rclone
rsync
s3cmd
screen
xterm
hpxfer
servers
name | status | notes |
---|---|---|
hpxfer3.jasmin.ac.uk |
ready to use | Physical server |
hpxfer4.jasmin.ac.uk |
ready to use | Physical server |
Notes:
- Tested with
sshftp
(GridFTP over SSH) from ARCHER2 - Same applies re. SSH client version, see login nodes
- The software collection
jasmin-xfer
available as per xfer servers, above hpxfer
access role no longer required for these new servers (role will be retired along with the old servers in due course, so no need to renew if you move to the new servers)
GridFTP server
Due to difficulties installing and configuring the suite of legacy components needed to support “old-style” gridftp, this services has now been discontinued. Please familiarise yourself with using Globus, see below: this provides equivalent (and better) functionality.
Note this does affect gridftp-over-ssh (sshftp
) which is available on the new hpxfer
nodes in the same way as their predecessors, see above.
Globus data transfer service
Where possible you should now use the Globus data transfer service for any data transfer in or out of JASMIN: this is now the recommended method, which will get you the best performance and has a number of advantages over logging into a server and doing transfers manually.
As introduced earlier this year, the following Globus collections are available to all users of JASMIN, with no special access roles required:
name | uuid | status | notes |
---|---|---|---|
JASMIN Default Collection | a2f53b7f-1b4e-4dce-9b7c-349ae760fee0 |
Ready to use | Best performance, currently has 2 physical Data Transfer Nodes (DTNs). |
JASMIN STFC Internal Collection | 9efc947f-5212-4b5f-8c9d-47b93ae676b7 |
Ready to use | For transfers involving other collections inside the STFC network. 2 DTNs, 1 physical, 1 virtual. Can be used by any user in case of issues with the above collection. |
Notes:
- These collections can be used with the Globus web interface , command-line interface (CLI) , or its Python software development kit (SDK) , and use the JASMIN accounts portal for authentication
Software
Please see the table below and accompanying notes which together summarise the upcoming changes to aspects of software on JASMIN:
Software | CentOS7 | Rocky 9 |
---|---|---|
IDL versions IDL licence server see Note 1 |
8.2, 8.5 (D), 8.5, 8.6 Flexnet |
8.9, 9.1(D) Next generation |
Cylc Cylc UI visualisation see Note 2 |
7.8.14 and 8.3.3-1 UI functionality integrated |
8.3.3-1 UI via browser: discussion ongoing |
Jaspy Jasr jasmin-sci |
2.7, 3.7*, 3.10* (*: all variants) 3.6, 4.0 (all variants), 4.2 URL page of the packages |
3.11 4.3 rpm/Glibc compatibility tba? |
Intel compilers | 12.1.5-20.0.0 (11 variants) | Intel oneAPI |
MPI library/ OpenMPI versions/compiler see Note 3 |
3.1.1/Intel,GNU, 4.0.0 4.1.[0-1,4-5]/Intel 4.1.2, 5.0.1, 5.1.2 |
4.1.5/Intel/gcc & 5.0.4 /intel/gcc Possibility to support mpich or IntelMPI |
NetCDF C library NetCDF Fortran binding lib. |
netcdf/gnu/4.4..7, netcdf/intel/14.0/ netcdff/gnu/4.4.7/*, netcdff/intel/4.4.7 parallel-netcdf/gnu/201411/22 parallel-netcdf/intel/20141122 |
A new module env for serial and parallel version GNU and Intel oneAPI build of NetCDF against either OpenMPI and/or Intel MPI |
GNU compilers | 7.2.0 ,8.1.0, 8.2.0 13.2.0 conda-forge (12.1.0 from legacy JASPY) |
11.4.1 (OS) 13.2.0 conda-forge via JASPY |
JULES see Note 4 |
Information to follow |
Notes:
-
IDL
- IDL versions 8.9 and 9.1 are now available on the Rocky 9 sci servers.
- These will also be the versions available on the new cluster, which will be announced in early 2025.
- Licensing is now in place to enable use of these versions on Rocky 9 servers, in runtime or interactive mode.
- For the limited remaining time that the existing LOTUS cluster is available (with CentOS7 nodes), 8.5 is the default with other legacy versions still available on those nodes.
-
Cylc
Note that Cylc 8 differs from Cylc 7 in many ways: architecture, scheduling algorithm, security, UIs, working practices and more. The Cylc 8 web UI requires the use of a browser (e.g. Firefox in the NoMachine desktop service)
-
MPI (further details to follow)
-
JULES (further details to follow)
Upgraded LOTUS cluster
Preliminary node specification:
type | status | specs |
---|---|---|
standard | Ready to use | 192 CPU AMD EPYC 9654 / 1.5 TB RAM / 480 GB SATA SSD + 800 GB NvMe SSD |
special | Not yet available | 192 CPU AMD EPYC 9654 / 6 TB RAM / 480 GB SATA SSD + 800 GB NvMe SSD |
Notes:
- Overall ~53,000 cores: ~triples capacity pf previous cluster (exact no. varies for operational reasons)
- New nodes will form a new cluster, managed separately to the “old” LOTUS
- Submission to the new cluster is now via any
sci-vm-*
orsci-ph-*
node - 70% of old LOTUS has now been decommissioned
New LOTUS2 cluster initial submission guide
Please see the updated LOTUS pages, including the how to submit a job page, to use the new Slurm scheduling partitions in LOTUS2.
These require a Slurm account, partition and quality of service (QoS) to be specified at job submission time.