Data Transfer Tools: Using the Globus Command-Line Interface

This article describes:

  • how to transfer data using the Globus Command Line Interface. It covers:
  • How an end-user can set up their host (laptop, desktop or home directory on their departmental server) with
    • the Globus Command-Line Interface (CLI) and 
    • Globus Connect Personal 
  • ...so that it can act as Globus endpoint and be used for efficient data transfers to JASMIN’s Globus endpoint (gridftp1.jasmin.ac.uk, a high-performance server in the JASMIN Data Transfer Zone)

It is not necessary to use the Globus CLI on a JASMIN server: it is a tool that you can use anywhere (for example your own desktop/laptop) to interact with the Globus service, to orchestrate a transfer between 2 endpoints. The CLI is not centrally installed on JASMIN, and does not need to be in the same place as either of the 2 endpoints involved in the transfer. The fact that one of those endpoints is the JASMIN Globus Endpoint does not mean that you need to be on JASMIN to orchestrate the transfer: you could use the CLI on your own laptop/desktop, even if the 2 endpoints were 2 institutional Globus endpoints on opposite sides of the world. You could of course decide to install the CLI in your home directory on JASMIN if that were useful as part of your processing/data transfer workflow.

The  Globus CLI is fully documented here. It provides a command-line interface for managed transfers via the Globus cloud-based transfer service, which usually achieves the best possible transfer rate over a given route compared to other methods. Typically this will be significantly faster than can be achieved over scp, rsync or sftp transfers, particularly if the physical network path is long.
The Globus CLI is designed for use either interactively within an interactive shell or in scripts. An alternative  Python software development kit (SDK) is also available and should be considered for more sophisticated workflows.
Alternatively, the Globus web interface at  https://app.globus.org can be used as an easy-to-use interface to orchestrate transfers interactively.
Whichever method is used: CLI, SDK or web interface, transfers are invoked as asynchronous, managed tasks which can then be monitored, and if need be set to retry automatically until some pre-set deadline.

Prerequisites

  • Linux environment with normal user privileges, or
  • Mac environment with ability to install applications, or
  • Windows environment with ability to install applications
  • Python environment for that platform, with ability to create virtual environments (to enable installation of additional packages)
  • An active JASMIN user account, with “jasmin-login” and “hpxfer” privileges.
    • NEW: You can now use your JASMIN account credentials to authenticate with the JASMIN Globus Endpoint (previously, for the old "JASMIN Gridftp server", your CEDA credentials were required). Be sure to select the correct endpoint however.

Note on access requirements

Access to the Globus endpoint provided by JASMIN (called the "JASMIN Gridftp server endpoint") is controlled by the JASMIN “hpxfer” access role: this is the same role which we use to control access to the servers hpxfer[12].jasmin.ac.uk. The process of registering asks for a specific IP address. However, if you are just using Globus (rather than logging in via SSH to hpxfer[12].jasmin.ac.uk), this address is not required, because the IP addresses of the Globus servers are already registered. In this case, a dummy value should be specified: please use that of host xfer1.jasmin.ac.uk whose IP address is 130.246.130.166. This will be accepted by the the registration process.

Please note that if you subsequently need to access hpxfer[1,2].jasmin.ac.uk for SSH-based transfers (but on these higher-performance machines), you may still need to contact the helpdesk to supply a specific IP address of the source host at your institution. However you can still access these 2 machines from within JASMIN (via the login nodes) to pull data from external hosts: you only need to supply the IP address if you need to initiate a direct connection from a host at your institution to one of these 2 machines.

Steps

In summary, the steps involved are as follows, but are explained in detail below:
  • Get a Globus ID if you haven’t already got one
  • Set up Globus CLI on end-user machine
  • Set up Globus Connect Personal on end-user machine. This is usually possible with regular (non-admin) user privileges.
  • Transfer some data, first using Globus Tutorial endpoints
  • Transfer some data to the JASMIN Globus Endpoint
In detail:
  • Get a Globus ID
  • Set up Globus CLI on end-user machine
    • Install the Globus CLI
      • This does not need to be installed or used on a JASMIN machine. Just as you can use the Globus web interface from anywhere, you can do the same with the CLI, so this could be on your own desktop/laptop and does not need to be in the same location as either of the endpoints you use in the transfer.
      • The CLI is not installed on JASMIN as it needs to be installed individually for each user: part of the process is authenticating with your own Globus ID. However you could choose to install it yourself in your JASMIN home directory. If you use one of the regular (non-hpxfer) xfer machines e.g. xfer1.jasmin.ac.uk, you should be able to make yourself a Python virtual environment in your home directory, as follows:
      • python3 -m venv ./venv
        source ./venv/bin/activate
        			
      • See https://docs.globus.org/cli/#installation
        • The Globus docs recommend pipx, which may not be available. pip install globus-cli inside your activated venv as above, works just as well.
        • Don't forget to include the path to the bin directory of your venv in your PATH, for convenience.
    • Log in to Globus

$ globus login

Copy & paste the resulting URL to your browser, obtain the Authorization code and enter this at the command line where you did “globus login”. You are now able to log in from this particular Globus CLI instance.

The instructions below show the process for Linux (command-line):
$ tar xzf globusconnectpersonal-latest.tgz 
$ cd globusconnectpersonal-2.x.x# replace 2.x.x with actual filename
$ globus whoami # Verify that you are already logged in (after “globus login” above)
username@globusid.org 
$ globus endpoint create --personal my-endpoint # choose label for this endpoint
Message:     Endpoint created successfully
Endpoint ID: 3922ca0e-5727-11e7-bf07-22000b9a448b
Setup Key:   5177b3ce-9292-46e4-91a7-ae0219f845f3

Complete the installation using the setup key:

$ ./globusconnectpersonal -setup 5177b3ce-9292-46e4-91a7-ae0219f845f3
Configuration directory: /home/users/mpritcha/.globusonline/lta
Contacting relay.globusonline.org:2223
Done!
<p>
	Check that you can see your endpoint listed (the endpoint ID and label should correspond to the values above
</p>
<pre>
$ globus endpoint search --filter-scope my-endpoints

ID                                   | Owner                      | Display Name            

------------------------------------ | -------------------------- | ------------------------               
3922ca0e-5727-11e7-bf07-22000b9a448b | mattpritchard@globusid.org | my-endpoint

Start globus connect personal

$ ./globusconnectpersonal -start &

List the contents of your a directory on the endpoint you have created, as a test:

(Note the syntax <endpointID>:<path>)

$ globus ls 3922ca0e-5727-11e7-bf07-22000b9a448b:/home/users/mpritcha/

So now we have a working Globus Connect Personal Endpoint. We can now try transferring files to another Globus endpoint.

Transfer some data, first using Globus Tutorial endpoints

Globus provides some open-access endpoints for testing: these can be activated simply with your globusid (created above) and are therefore useful for testing that everything is working properly. Let’s set up some shorthand names for these endpoints:

$ go1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ go2=ddb59af0-6d04-11e5-ba46-22000b92c6ec

You can also find these if you do

$ globus endpoint search tutorial

Activate the “globus tutorial endpoint 1”

$ globus endpoint activate $go1
Autoactivation succeeded with message: Endpoint activated successfully using Globus Online credentials.

Activate the “globus tutorial endpoint 2”

$ globus endpoint activate $go2

Autoactivation succeeded with message: Endpoint activated successfully using Globus Online credentials.

Transfer a local file to “globus tutorial endpoint 1”

$ globus transfer myfile.dat $go1 # (where myfile.dat is a local file).

Transfer some data to the JASMIN Globus Endpoint

Let's locate the JASMIN Globus Endpoint: you can find it with a search as follows: 

$ $ globus endpoint search "jasmin globus endpoint"
ID                                   | Owner                      | Display Name                               
------------------------------------ | -------------------------- | -------------------------------------------
2b0a1a4c-ee1f-11eb-b467-eb47ba14b5cc | ceda@globusid.org          | JASMIN Globus Endpoint (jasmin credentials)

The old endpoint is this one, which used CEDA credentials: this will be deprecated soon (look out for announcements):

4cc8c764-0bc1-11e6-a740-22000bf2d559 | ceda@globusid.org          | JASMIN gridftp server (ceda credentials)   

Its endpoint ID is 2b0a1a4c-ee1f-11eb-b467-eb47ba14b5cc : please check the endpoint ID to make sure it matches.

In order to use this endpoint, we need to activate it.

Let’s set up some shorthands:

$ ep1=3922ca0e-5727-11e7-bf07-22000b9a448b # our local endpoint
$ ep2=2b0a1a4c-ee1f-11eb-b467-eb47ba14b5cc # JASMIN Globus Endpoint

Activate the JASMIN endpoint (ep2). This particular endpoint is already configured to use the JASMIN “SLCS” service to provide short-term credentials using your JASMIN account credentials: the username and password with which you would log in to the JASMIN accounts portal.

$ globus endpoint activate $ep2 --myproxy -U username
Myproxy password: 
Endpoint activated successfully using a credential fetched from a MyProxy server.

Note (1) You can also specify the password in the command using the -P option, to do this in one action, but this is less secure as your password will be visible in your system’ command history

Note (2)  This means that you can activate / re-activate your credential at any time, independently of any transfers.

You can alternatively activate your endpoint by using the --web which opens up your default web browser to complete the activation using the Globus web interface. You can then return to your terminal window once this step has completed successfully:

globus endpoint activate $ep2 --web

Try a listing on the JASMIN endpoint (ep2). The path you choose needs to be one for which you have access permissions, for example your home directory or a group workspace you belong to.

$ globus ls $ep2:/group_workspaces/jasmin/cedaproc/username/
mydir/
testfile

Do a transfer from ep1 to ep2

$ globus transfer $ep1:/home/users/username/1G.dat $ep2:/group_workspaces/jasmin/cedaproc/username/1G.dat --label "my first transfer"
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 86e4a498-572b-11e7-bf07-22000b9a448b

Check on the progress of my task

	$ globus task list
Task ID                              | Status    | Type     | Source Display Name       | Dest Display Name      | Label            
------------------------------------ | --------- | -------- | ------------------------- | ---------------------- | -----------------
86e4a498-572b-11e7-bf07-22000b9a448b | SUCCEEDED | TRANSFER | my-endpoint      | JASMIN gridftp server  | my first transfer

Automation

Examples of automation using the Globus CLI