Transfers from ARCHER2

This article explains how to transfer data between ARCHER2 and JASMIN. It covers:

  • The choice of available tools / routes
  • Example of how to use the currently-recommended method

Choice of available Tools/Routes

See  JASMIN external connections and Data Transfer Tools  for general details.

Users transferring data between ARCHER2 and JASMIN are often transferring relatively large sets of data, so it is important to choose the most appropriate route, method and tools to ensure you get the most efficient and reliable transfer experience. This can vary depending on system and network conditions.

If you want to try  all the options available, you will need:

  • high-performance data transfer access on JASMIN
  • a login account at ARCHER2 with access to the
  • to have registered the subject of your CEDA-issued short-term credential with ARCHER support.

Check the examples in the linked documentation articles and ensure that you use them between the hosts used in the examples. Not all services connect over all routes to/from all hosts!

NOTE:

  • Enquiries about issues at the ARCHER2 end should be directed to ARCHER2 support (support@archer2.ac.uk)
  • Enquiries about issues at the JASMIN end should be directed to JASMIN support (use beacon, below-right or support@jasmin.ac.uk)

Table 1, below, shows recommended combinations of hosts & tools for transfers between RDF and JASMIN.

scp/rsync/sftp
Simple transfer using easy method to general purpose xfer nodes. Convenient.
source dest notes
login.archer2.ac.uk xfer1.jasmin.ac.uk over 10G JANET, but to virtual machine at JASMIN end
login.archer2.ac.uk xfer2.jasmin.ac.uk
GridFTP over SSH  

Current recommended method (June 2021)

GridFTP performance with convenience of SSH.
login.archer2.ac.uk hpxfer1.jasmin.ac.uk over 10G JANET
login.archer2.ac.uk hpxfer2.jasmin.ac.uk over 10G JANET
hpxfer2 is configured for longer distances but can be useful if hpxfer1 is busy
GridFTP using certificate auth
Fully-featured GridFTP. Best performance.
login.archer2.ac.uk gridftp1.jasmin.ac.uk over 10G JANET.
Dedicated GridFTP server.
No need for persistent SSH agent at ARCHER2 end

Table 1: comparison of current methods and routes for transferring data between RDF and JASMIN.

Recommended method: example

The currently-recommended method for transfers between ARCHER2 and JASMIN is using globus-url-copy with the concurrency option, as described below:

1. Log in to the login node at ARCHER2

You will need to have loaded into your SSH agent:

  • The SSH key associated with your JASMIN account
  • The SSH key associated with your ARCHER2 account, if you have one (it is recommended to use a different one than for JASMIN, if so)

You also need to ensure that you connect with the -A option for agent forwarding, to enable the JASMIN key to be available for the onward authentication with the JASMIN server.

$ ssh-add <jasmin ssh key> #(path to your JASMIN ssh key file on your local machine)
$ ssh-add <archer2 ssh key> #(path to your ARCHER2 ssh key if you have one, on on your local machine)
$ ssh-add -l # check both keys are loaded (are both key signatures listed in the output?)
$ ssh -A <archer2-username>@login.archer2.ac.uk
(you are prompted for your password by the ARCHER2 system, whether or not use an SSH key with your ARCHER2 account)

2. Load the gct module (to make the current globus-url-copy command available in the path)

$ module load gct
$ which globus-url-copy
/work/y07/shared/gct/v6.2.20201212/bin/globus-url-copy

3. Transfer a single file to your home directory on JASMIN (not best performance, but to check things work)

$ globus-url-copy -vb <file> sshftp://<jasmin-username>@hpxfer1.jasmin.ac.uk/home/users/<jasmin-username>/<file>

Obviously, replace <jasmin-username> with your username on JASMIN, and <file> with the file you want to transfer.

4. Recursively transfer a directory of files, using the concurrency option for multiple parallel transfers

$ globus-url-copy-temp -vb -cd -r -cc 4 src/data/ sshftp://<jasmin-username>@hpxfer1.jasmin.ac.uk/path/dest/data/

NOTE: The -cc option initiates the parallel transfer of several files at a time, which achieves good overall transfer rates for recursive directory transfers. This is different from using the -p N -fast options which use parallel network streams to parallelism the transfer of each file. The -p N -fast options are not currently supported along this route (for different technical reasons at each end, so please do not use them until further notice).

Here, the options used are (see man globus-url-copy for full details):

-vb | -verbose-perf 
       During the transfer, display the number of bytes transferred
       and the transfer rate per second.  Show urls being transferred
-concurrency | -cc
      Number of concurrent ftp connections to use for multiple transfers.
-cd | -create-dest
       Create destination directory if needed
-r | -recurse
       Copy files in subdirectories

Experiment with different concurrency options (4 is a good start, more than 16 would start to "hog" resources so please consider

5. Use the sync option to synchronise 2 directories between source and target file systems:

$ globus-url-copy-temp -vb -cd -r -cc 4 -sync src/data/ sshftp://<jasmin-username>@hpxfer1.jasmin.ac.uk/path/dest/data/

where  src/data/ and /path/dest/data/ are source and destination paths, respectively (include trailing slash).

Options are as before but with:

-sync
       Only transfer files where the destination does not exist or differs
       from the source.  -sync-level controls how to determine if files
       differ

Note that the default sync level is 2, see level descriptions below, which only compares file sizes. If you want to include a file integrity check using checksums, you need to use -sync-level 3 but there may be a performance cost.

-sync-level 
	Choose criteria for determining if files differ when performing a
       sync transfer.  Level 0 will only transfer if the destination does
       not exist.  Level 1 will transfer if the size of the destination
       does not match the size of the source.  Level 2 will transfer if
       the timestamp of the destination is older than the timestamp of the
       source, or the sizes do not match.  Level 3 will perform a checksum of
       the source and destination and transfer if the checksums do not match,
       or the sizes do not match.  The default sync level is 2.

So a full sync including comparison of checksums would be:

$ globus-url-copy-temp -vb -cd -r -cc 4 -sync -sync-level 3 src/data/ sshftp://<jasmin-username>@hpxfer1.jasmin.ac.uk/path/dest/data/