JASMIN Help Site logo JASMIN Help Site logo
  • Docs 
  • Guides 
  • Training 
  • Discussions   

  •   Search this site  

Can't find what you're looking for?

Try our Google custom search, across all JASMIN sites

Docs
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  1.   Data transfer
  1. Home
  2. Docs
  3. Data transfer
  4. Transfers from ARCHER2

Transfers from ARCHER2

 

Share via
JASMIN Help Site
Link copied to clipboard

Transferring data from ARCHER2 to JASMIN, efficiently

On this page
Choice of available Tools/Routes   Available transfer methods   1st choice method: Globus   2nd choice method: Basic SSH transfer   3rd choice method: gridftp over SSH  

Choice of available Tools/Routes  

See Data Transfer Tools for general information.

Users transferring data between ARCHER2 and JASMIN are often transferring relatively large sets of data, so it is important to choose the most appropriate route, method and tools to ensure you get the most efficient and reliable transfer experience. This can vary depending on system and network conditions.

The recommended option (as of mid-2024) is now Globus.

Common requirements to all of the methods are:

  • an account with the jasmin-login  access role on JASMIN.
  • a login account at ARCHER2

Please note:

  • Enquiries about access to or use of ARCHER2 should be directed to ARCHER2 support ( support@archer2.ac.uk)
  • Enquiries about access to or use of JASMIN should be directed to JASMIN support (use beacon, below-right or support@jasmin.ac.uk)

Available transfer methods  

  1. Globus (recommended)
  2. Basic SSH transfer (slow but convenient)
  3. Gridftp using SSH authentication (efficient, currently still available but now superceded in convenience/reliability by Globus)

1st choice method: Globus  

This is now the recommended method, because:

  • it always ensures the best performance
  • it is a managed transfer service, less prone to overloading and system issues
  • it is actively maintained
  • it is easy to use

Because Globus can do transfers between two third-party locations, you don’t necessarily need to invoke the transfers from a machine on JASMIN, or ARCHER2 (even though it’s those two locations which will be involved as source and destination for the transfer). This could be done from your laptop or desktop, but could also be done from within a workflow that’s running somewhere (e.g. ARCHER2 or JASMIN). So, first think about where you want to control the process from.

In that location, follow the steps below:

1. Set up the Globus Command Line interface

  • follow the steps described here

2. Identify the collections that you want to transfer between, for your transfer:

In this case, these are likely to be:

  • the ARCHER2 filesystems collection  , with ID 3e90d018-0d05-461a-bbaf-aab605283d21
  • the JASMIN default collection  , with ID a2f53b7f-1b4e-4dce-9b7c-349ae760fee0

Set an environment variable for each of these, to avoid having to type the ID each time:

export a2c=3e90d018-0d05-461a-bbaf-aab605283d21
export jdc=a2f53b7f-1b4e-4dce-9b7c-349ae760fee0

3. Check access to these collections

These collections are restricted-access rather than public, so your access to them is via a series of authentication/authorisation/consent steps which the following actions will guide you through:

globus ls $a2c:/~/
 (ARCHER2 home directory file listing should appear)
globus ls $jdc:/~/
 (JASMIN home directory file listing should appear)

The steps above establish your ability to interact with each of the specified collections using Globus. Once you’ve completed each one, you should see a directory listing.

Once you’ve completed the steps for both source and destination collections, you are ready to try a transfer.

4. Initiate a simple transfer

globus transfer $a2c:/~/1M.dat $jdc:/~/1M.dat
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: aa0597a4-80a7-11ef-b36b-a1206a7ee65f

This should complete quite quickly for a small file, but for a larger file you can check the progress using the task ID.

globus task show aa0597a4-80a7-11ef-b36b-a1206a7ee65f
Label:                        None
Task ID:                      aa0597a4-80a7-11ef-b36b-a1206a7ee65f
Is Paused:                    False
Type:                         TRANSFER
Directories:                  0
Files:                        1
Status:                       SUCCEEDED
Request Time:                 2024-10-02T10:18:32+00:00
Faults:                       0
Total Subtasks:               2
Subtasks Succeeded:           2
Subtasks Pending:             0
Subtasks Retrying:            0
Subtasks Failed:              0
Subtasks Canceled:            0
Subtasks Expired:             0
Subtasks with Skipped Errors: 0
Completion Time:              2024-10-02T10:18:39+00:00
Source Endpoint:              Archer2 file systems
Source Endpoint ID:           3e90d018-0d05-461a-bbaf-aab605283d21
Destination Endpoint:         JASMIN Default Collection
Destination Endpoint ID:      a2f53b7f-1b4e-4dce-9b7c-349ae760fee0
Bytes Transferred:            1000000
Bytes Per Second:             148452

If you wanted to use the above in a script, and block/wait for the transfer task to complete before continuing, you can use globus task wait <taskid>, for example:

globus task wait aa0597a4-80a7-11ef-b36b-a1206a7ee65f

will now return control immediately, since the task has completed.

Globus transfer tasks are asynchronous, submitted to your own mini-queue, where you can have as many queued tasks as you like but only 3 in progress at any one time. This ensures good performance for all users, but your tasks do not linger in long multi-user queues. The best way to reassure yourself of this is to try it out.

For help with any globus command you can do globus <command> --help.

Further examples including sync and automation are given in Globus command line interface, with further examples in the Globus documentation at https://  .

Relevant examples:

  • sync with wait  using the CLI.
  • Repeatable transfer  using the PythonSDK (more advanced)

Note that Globus transfers (and other actions) can be managed & monitored by:

  • a web interface
  • the command-line interface, and
  • a Python library

all of which interact with the same underlying service.

NCAS-CMS users should note that work is currently underway to adopt Globus as a drop-in replacement for certificate-based gridftp in Rose suites currently in use for automating processing and transferring to JASMIN.

2nd choice method: Basic SSH transfer  

scp/rsync/sftp: Simple transfers using easy method, pushing data to general purpose xfer nodes. Convenient, but limited performance.

source dest notes
login.archer2.ac.uk xfer-vm-0[123].jasmin.ac.uk to virtual machine at JASMIN end
login.archer2.ac.uk hpxfer[34].jasmin.ac.uk to high-performance physical machine at JASMIN end

3rd choice method: gridftp over SSH  

GridFTP over SSH: GridFTP performance with convenience of SSH. Requires persistent ssh agent on local machine where you have your JASMIN key.

source dest
login.archer2.ac.uk hpxfer[34].jasmin.ac.uk

The next-best method for transfers between ARCHER2 and JASMIN is using the globus-url-copy client tool with SSH authentication, as described below: (This is not Globus, however, despite the tool name!)

1. Load your SSH keys for both JASMIN and ARCHER2 on your local machine, then log in to ARCHER2.

You will need to have loaded into your SSH agent:

  • The SSH key associated with your JASMIN account
  • The SSH key associated with your ARCHER2 account, if you have one (it is recommended to use a different one than for JASMIN, if so)

You also need to ensure that you connect with the -A option for agent forwarding, to enable the JASMIN key to be available for the onward authentication with the JASMIN server.

Note that you do not (and should not) copy your JASMIN private key to ARCHER2. It should stay on your local machine. This does mean that you need an ssh- agent running on your local machine, so this method may not work for long- running continuous processes that need to spawn transfers.

ssh-add <jasmin ssh key> (path to your JASMIN ssh key file on your local machine)
ssh-add <archer2 ssh key> (path to your ARCHER2 ssh key if you have one, on on your local machine)
ssh-add -l  check both keys are loaded (are both key signatures listed in the output?)
ssh -A <archer2-username>@login.archer2.ac.uk
#(ARCHER2 now uses multi-factor auth at this stage)

2. Load the gct module (to make the current globus-url-copy command available in the path)

module load gct
which globus-url-copy
/work/y07/shared/gct/v6.2.20201212/bin/globus-url-copy

3. Transfer a single file to your home directory on JASMIN (limited space, but to check things work)

globus-url-copy -vb <file> sshftp://<jasmin-username>@hpxfer3.jasmin.ac.uk/~/<file>

Obviously, replace <file> with the path to the file you want to transfer.

4. Recursively transfer a directory of files, using the concurrency option for multiple parallel transfers

globus-url-copy -vb -cd -r -cc 4 SRC/DATA/ sshftp://<jasmin-username>@hpxfer3.jasmin.ac.uk/DEST/DATA/

NOTE: - The -cc option initiates the parallel transfer of several files at a time, which achieves good overall transfer rates for recursive directory transfers. This is different from using the -p N -fast options which use parallel network streams to parallelism the transfer of each file. A sensible value for -cc is 2 or 4, whereas a sensible value for -p is between 2 and 16. In both cases, try first and avoid numbers at the higher end, which can increase resource usage without further performance gains.

Here, the options used are (see man globus-url-copy for full details):

-vb | -verbose-perf 
        During the transfer, display the number of bytes transferred
        and the transfer rate per second.  Show urls being transferred
-concurrency | -cc
      Number of concurrent ftp connections to use for multiple transfers.
-cd | -create-dest
        Create destination directory if needed
-r | -recurse
        Copy files in subdirectories

5. Use the sync option to synchronise 2 directories between source and target file systems:

globus-url-copy -vb -cd -r -cc 4 -sync SRC/DATA/ sshftp://<jasmin-username>@hpxfer3.jasmin.ac.uk/DEST/DATA/

where SRC/DATA/ and /DEST/DATA/ are source and destination paths, respectively (include trailing slash).

Options are as before but with:

-sync
        Only transfer files where the destination does not exist or differs
        from the source.  -sync-level controls how to determine if files
        differ

Note that the default sync level is 2, see level descriptions below, which only compares time stamps. If you want to include a file integrity check using checksums, you need to use-sync-level 3 but there may be a performance cost.

-sync-level 
        Choose criteria for determining if files differ when performing a
        sync transfer.  Level 0 will only transfer if the destination does
        not exist.  Level 1 will transfer if the size of the destination
        does not match the size of the source.  Level 2 will transfer if
        the timestamp of the destination is older than the timestamp of the
        source, or the sizes do not match.  Level 3 will perform a checksum of
        the source and destination and transfer if the checksums do not match,
        or the sizes do not match.  The default sync level is 2.

So a full sync including comparison of checksums would be:

globus-url-copy -vb -cd -r -cc 4 -sync -sync-level 3 SRC/DATA/ sshftp://<jasmin-username>@hpxfer3.jasmin.ac.uk/DEST/DATA/
Last updated on 2025-02-06 as part of:  gridftp, hpxfer, nx, sci updates (7084b7cf0)
On this page:
Choice of available Tools/Routes   Available transfer methods   1st choice method: Globus   2nd choice method: Basic SSH transfer   3rd choice method: gridftp over SSH  
Follow us

Social media & development

   

Useful links

  • CEDA Archive 
  • CEDA Catalogue 
  • JASMIN 
  • JASMIN Accounts Portal 
  • JASMIN Projects Portal 
  • JASMIN Cloud Portal 
  • JASMIN Notebooks Service 
  • JASMIN Community Discussions 

Contact us

  • Helpdesk
UKRI/STFC logo
UKRI/NERC logo
NCAS logo
NCEO logo
Accessibility | Terms and Conditions | Privacy and Cookies
Copyright © 2025 Science and Technology Facilities Council.
Hinode theme for Hugo licensed under Creative Commons (CC BY-NC-SA 4.0).
JASMIN Help Site
Code copied to clipboard