JASMIN User's Guide
The JASMIN user's guide was set out based on user queries and issues encountered by the JASMIN team. We encourage all users to comply with the following guidelines, particularly important in the current emergency situation -COVID-19 Lockdown- where access by staff to resolve problems by physical intervention may not be possible in a timely manner, or at all:
"Sci" machines usage guidelines
- Check the current load and number of users on the sci machines, as shown by the login servers, to select a less-used sci machine. The available Sci machines and their specifications are listed in the table of this help page
- The sci machines are not for running large, long-running tasks, or scripts that spawn multiple child processes. The batch processing cluster LOTUS is available for heavier processing. The sci machines are for development, testing, and light interactive use. Overloading these with processing seriously impairs performance for interactive use by others.
- Do not write to the temporary partition
/tmpon sci machines. Use your home directory, a scratch volume or a Group Workspace . Any temporary data files can reside in a subdirectory of your group workspace instead of /tmp. To do this, please add the following lines (or similar) to your $HOME/.bashrc file:
- # create the directory if needed
[ -d $TMPDIR ] || mkdir -p $TMPDIR
- If a process hangs, do not simply close the terminal window. Please contact the helpdesk and alert the team so that the process can be shut down. Otherwise hung processes build up and contribute to machine overloading.
- Do not “hog” IDL development licenses. A limited number of these are available for development and compilation of IDL code which should then be run on LOTUS using IDL runtime licenses, of which there are many more.
- Do not use sci machines for data transfer: xfer hosts are provided for this purpose.
LOTUS usage guidelines
- Do not use IDL development licences on LOTUS. There are many runtime licenses available, but the development licenses are for interactive use on the sci machines, where IDL code can be compiled, then run on LOTUS using a runtime license.
- Beware of inadvertently filling up /tmp on LOTUS nodes. This can take nodes out of action (perhaps for other users who still have jobs running on the same node) if /tmp fills up. Design your code to clean up as it goes along, and use environment variables to control where your applications write temporary data, ideally to storage which is not specific to a LOTUS node. If your job crashes, check which nodes were involved and clean up after yourself.
- Do not store data in scratch areas for long periods of time. Move data away to group workspaces once your processing has finished.
Xfer servers guidelines
- Do not run a large number (>16) of rsync or scp transfer processes in parallel.
- Do not run processing on xfer servers: they are provided for data transfer only
- For heavy/high-performance data transfers, avoid virtual machines jasmin-xfer1/cems-xfer1 and consider using high-performance servers or methods.
How to report an issue
When you do experience an issue, please;
- Make it clear whether you are simply advising the helpdesk of a general issue (which will be noted, but not necessarily investigated for a specific response), or
- Provide FULL and SPECIFIC details of your problem so that it can be investigated. JASMIN is a complex infrastructure with many hundreds of hosts and storage volumes, so reporting that “JASMIN” or “Storage” is slow, is not sufficient.
- If you are experiencing difficulties accessing a particular storage volume from a particular sci machine, please state
- the full path to the data you are trying to access
- The full hostname of the machine (but please try the same access from at least one other machine to help establish whether it’s related to the machine or the storage)
- The date and time of the issue (for matching up with system reports/log files. Using the date and time of the email is not sufficient: please be specific in your report)
- Be patient and understand that, particularly at present (where nearly the entire CEDA and JASMIN teams are working remotely), queries will take longer to resolve.
- Some issues will only be resolved by strategic improvements which are planned as part of phased upgrades to JASMIN accompanied by capital procurements followed by integration work, all carried out by the same, small team.