Globus Command-Line Interface
Data Transfer Tools: Using the Globus Command-Line Interface
Please read Globus transfers with JASMIN first for a wider introduction to Globus on JASMIN.
This article describes
It is not necessary to use the Globus CLI on a JASMIN server: it is a tool that you can use anywhere (for example your own desktop/laptop) to interact with the Globus service, to orchestrate a transfer between 2 endpoints (collections, in new Globus terminology). The CLI is not centrally installed on JASMIN, and does not need to be in the same place as either of the 2 collections involved in the transfer. You could use the CLI on your own laptop/desktop, even if the 2 collections were 2 institutional Globus collections on opposite sides of the world. You could of course decide to install the CLI in your home directory on JASMIN if that were useful as part of your processing/data transfer workflow.
The Globus CLI is fully documented here with examples . It provides a command-line interface for managed transfers via the Globus cloud-based transfer service, which usually achieves the best possible transfer rate over a given route compared to other methods. Typically this will be significantly faster than can be achieved over scp, rsync or sftp transfers, particularly if the physical network path is long.
The Globus CLI is designed for use either interactively within an interactive shell or in scripts. An alternative Python software development kit (SDK) is also available and should be considered for more sophisticated workflows.
Alternatively, the Globus web interface at https://app.globus.org can be used as an easy-to-use interface to orchestrate transfers interactively.
Whichever method is used: CLI, SDK or web interface, transfers are invoked as asynchronous, managed tasks which can then be monitored, and if need be set to retry automatically until some pre-set deadline.
Go to https://app.globus.org and either:
See also https://docs.globus.org/how-to/get-started/
Do the following on your own (local) machine. Make a Python virtual environment and activate it:
python3 -m venv ./venv
source ./venv/bin/activate
Download the Globus CLI and install it into the virtual environment ( venv
).
pip install globus-cli
Try the globus login
command. The first time you run this, you will be
prompted to authorise the Globus CLI to carry out operations on behalf of your
Globus ID. The URL will open in your default browser, where you should
authenticate with your Globus ID credentials. If you prefer, you can
copy/paste the URL from the command-line to a browser of your choice. Either
way, you then need to click “Allow” in the browser window, then copy/paste the
resulting “Native App Authorization Code” back to the terminal window where
you issued the globus login
command:
globus login --no-local-server
Please authenticate with Globus here:
------------------------------------
https://auth.globus.org/v2/oauth2/authorize?client_id=abc1234-9c3c-4ad42-be31-8d6c87101239014&redirect_uri=https%3A%2F%2Fauth.globus.org%2Fv2%2Fweb%2Fauth-code&scope=openid+profile+email+urn%3Aglobus%3Aauth%3Ascope%3Aauth.globus.org%3Aview_identity_set+urn%3Aglobus%3Aauth%3Ascope%3Atransfer.api.globus.org%3Aall+urn%3Aglobus%3Aauth%3Ascope%3Agroups.api.globus.org%3Aall+urn%3Aglobus%3Aauth%3Ascope%3Asearch.api.globus.org%3Aall&state=_default&response_type=code&access_type=offline&prompt=login
------------------------------------
Enter the resulting Authorization Code here:
You should then see the following:
You have successfully logged in to the Globus CLI!
You can check your primary identity with
globus whoami
For information on which of your identities are in session use
globus session show
Logout of the Globus CLI with
globus logout
You can now use the Globus CLI commands as listed by the following command:
globus --help
Usage: globus [OPTIONS] COMMAND [ARGS]...
Interact with Globus from the command line
All `globus` subcommands support `--help` documentation.
Use `globus login` to get started!
The documentation is also online at https://docs.globus.org/cli/
Options:
-v, --verbose Control level of output
-h, --help Show this message and exit.
-F, --format [unix|json|text] Output format for stdout. Defaults to text
--jmespath, --jq TEXT A JMESPath expression to apply to json
output. Takes precedence over any specified '
--format' and forces the format to be json
processed by this expression
--map-http-status TEXT Map HTTP statuses to any of these exit codes:
0,1,50-99. e.g. "404=50,403=51"
Commands:
bookmark Manage endpoint bookmarks
collection Manage your Collections
delete Submit a delete task (asynchronous)
endpoint Manage Globus endpoint definitions
get-identities Lookup Globus Auth Identities
group Manage Globus Groups
list-commands List all CLI Commands
login Log into Globus to get credentials for the Globus CLI
logout Logout of the Globus CLI
ls List endpoint directory contents
mkdir Create a directory on an endpoint
rename Rename a file or directory on an endpoint
rm Delete a single path; wait for it to complete
search Use Globus Search to store and query for data
session Manage your CLI auth session
task Manage asynchronous tasks
transfer Submit a transfer task (asynchronous)
update Update the Globus CLI to its latest version
version Show the version and exit
whoami Show the currently logged-in identity
We will use the globus endpoint search
subcommand. Find help on the
particular options for that with
globus endpoint search --help
Usage: globus endpoint search [OPTIONS] [FILTER_FULLTEXT]
Search for Globus endpoints with search filters. If --filter-scope is set to
the default of 'all', then FILTER_FULLTEXT is required.
If FILTER_FULLTEXT is given, endpoints which have attributes (display name,
legacy name, description, organization, department, keywords) that match the
search text will be returned. The result size limit is 100 endpoints.
Options:
--filter-scope [all|administered-by-me|my-endpoints|my-gcp-endpoints|recently-used|in-use|shared-by-me|shared-with-me]
The set of endpoints to search over.
[default: all]
--filter-owner-id TEXT Filter search results to endpoints owned by
a specific identity. Can be the Identity ID,
or the Identity Username, as in
"go@globusid.org"
--limit INTEGER RANGE The maximum number of results to return.
[default: 25; 1<=x<=1000]
-v, --verbose Control level of output
-h, --help Show this message and exit.
-F, --format [unix|json|text] Output format for stdout. Defaults to text
--jmespath, --jq TEXT A JMESPath expression to apply to json
output. Takes precedence over any specified
'--format' and forces the format to be json
processed by this expression
--map-http-status TEXT Map HTTP statuses to any of these exit
codes: 0,1,50-99. e.g. "404=50,403=51"
Search for the collections matching the search term “tutorial”:
globus endpoint search "tutorial"
ID | Owner | Display Name
------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------
6c54cade-bde5-45c1-bdea-f4bd71dba2cc | 6df1b656-c953-40a3-91a9-e9e8ad5173ea@clients.auth.globus.org | Globus Tutorial Collection 1
31ce9ba0-176d-45a5-add3-f37d233ba47d | 6df1b656-c953-40a3-91a9-e9e8ad5173ea@clients.auth.globus.org | Globus Tutorial Collection 2
The 2 globus tutorial collections actually “see” the same filesystem, so we’ll just use the first one.
For convenience, let’s set environment variables representing the ID of this collection:
export c1=6c54cade-bde5-45c1-bdea-f4bd71dba2cc
echo $c1
6c54cade-bde5-45c1-bdea-f4bd71dba2cc
Let’s try listing that collection, so that we know we can interact with it. We are prompted to grant consent first:
globus ls $c1
The collection you are trying to access data on requires you to grant consent for the Globus CLI to access it.
Please run:
globus session consent 'urn:globus:auth:scope:transfer.api.globus.org:all[*https://auth.globus.org/scopes/6c54cade-bde5-45c1-bdea-f4bd71dba2cc/data_access]'
to login with the required scopes.
Copy & paste the command it gives you (don’t copy the one above) and run it, which should open a web browser window. Follow the instructions which should complete the process, then return to your terminal session.
Now let’s find another collection, this time a public test collection which can be used for performance testing:
globus endpoint search "star dtn"
ID | Owner | Display Name
------------------------------------ | ------------------ | -------------------------------------------------
ff2ee779-54fb-4dac-ade2-57568c587ae3 | esnet@globusid.org | ESnet STAR DTN private collection
ece400da-0182-4777-91d6-27a1808f8371 | esnet@globusid.org | ESnet Starlight DTN (Anonymous read only testing)
e9e0d9f4-c419-44e0-8198-017fd61bf0c4 | esnet@globusid.org | ESnet Starlight DTN (read-write testing)
We’ll use the one labelled
Anonymous read only testing.
Set stardtn
to the ID of this endpoint:
export stardtn=ece400da-0182-4777-91d6-27a1808f8371
Use the endpoint ls
command to list the contents of the stardtn
endpoint,
at the path /
globus ls $stardtn:/
500GB-in-large-files/
50GB-in-medium-files/
5GB-in-small-files/
5MB-in-tiny-files/
Climate-Huge/
Climate-Large/
Climate-Medium/
Climate-Small/
bebop/
logs/
write-testing/
100G.dat
100M.dat
10G.dat
10M.dat
1G.dat
1M.dat
500G.dat
50G.dat
50M.dat
These are files and directories containing dummy data which can be used for test purposes.
Let’s transfer the file 1M.dat
from the stardtn
endpoint to c1
:
globus transfer $stardtn:/1M.dat $c1:/~/1M.dat
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 74cb181c-bf63-11ee-a90e-032e06ca0965
The transfer task is a separate activity and does not require any connection from the CLI client to either of the 2 endpoints: the Globus transfer service manages the transfer for us. We can check on the progress of this transfer task with:
globus task show 74cb181c-bf63-11ee-a90e-032e06ca0965
Label: None
Task ID: 74cb181c-bf63-11ee-a90e-032e06ca0965
Is Paused: False
Type: TRANSFER
Directories: 0
Files: 1
Status: SUCCEEDED
Request Time: 2024-01-30T11:33:58+00:00
Faults: 0
Total Subtasks: 2
Subtasks Succeeded: 2
Subtasks Pending: 0
Subtasks Retrying: 0
Subtasks Failed: 0
Subtasks Canceled: 0
Subtasks Expired: 0
Subtasks with Skipped Errors: 0
Completion Time: 2024-01-30T11:34:01+00:00
Source Endpoint: ESnet Starlight DTN (Anonymous read only testing)
Source Endpoint ID: ece400da-0182-4777-91d6-27a1808f8371
Destination Endpoint: Globus Tutorial Collection 1
Destination Endpoint ID: 6c54cade-bde5-45c1-bdea-f4bd71dba2cc
Bytes Transferred: 1000000
Bytes Per Second: 421388
We can also list the destination collection to check that the file has reached its destination:
globus ls $c1:/~/
1M.dat
We can also make a subdirectory with mkdir
:
globus mkdir $c1:/~/mydata/
The directory was created successfully
We can move our 1M.dat
into that directory with a globus rename
command
globus rename $c1 /~/1M.dat /~/mydata/1M.dat
File or directory renamed successfully
We now have a directory mydata
containing files 1M.dat
:
globus ls $c1:/~/mydata/
1M.dat
Now Let’s copy a directory from the stardtn
collection which contains some small
files, to our destination endpoint c1
(The Globus tutorial collections only
provide very limited storage space).
The files we want to copy are at the path /5MB-in-tiny-files/a/a/
on
the stardtn
endpoint, and are small, as their names suggest:
globus ls $stardtn:/5MB-in-tiny-files/a/a/
a-a-1KB.dat
a-a-2KB.dat
a-a-5KB.dat
Copy the parent directory recursively to ep1
:
globus transfer -r $stardtn:/5MB-in-tiny-files/a/a $c1:/~/star-data
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 4ae9bab0-7d40-11ec-bef3-a18800fa5978
Check destination content:
globus ls $c1
mydata1/
star-data/
globus ls $c1:/~/star-data
a-a-1KB.dat
a-a-2KB.dat
a-a-5KB.dat
We could now delete one of the small files using the globus delete
command:
globus delete $c1:/~/star-data/a-a-2KB.dat
Message: The delete has been accepted and a task has been created and queued for execution
Task ID: be4d6934-7d40-11ec-891f-939ceb6dfaf1
And list contents again, to verify that it has been deleted:
globus ls $c1:/~/star-data
a-a-1KB.dat
a-a-5KB.dat
We could now repeat the copying of the source data, but this time using the
-s
or --sync-level exists
command so that we only copy the data that is
now missing from the destination. The full set of sync options is
[exists|size|mtime|checksum]
.
globus transfer -s exists -r $stardtn:/5MB-in-tiny-files/a/a $c1:/~/star-data
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 759a3cac-7d41-11ec-bef3-a18800fa5978
This should only copy the data that do not already exist at the desination: We end up with the same set of files at the destination:
globus ls $c1:/~/star-data
a-a-1KB.dat
a-a-2KB.dat
a-a-5KB.dat
But we can see that only 2000 bytes were transferred (so we know it only copied that one file, which is what we wanted):
globus task show 759a3cac-7d41-11ec-bef3-a18800fa5978
Label: None
Task ID: 759a3cac-7d41-11ec-bef3-a18800fa5978
Is Paused: False
Type: TRANSFER
Directories: 1
Files: 3
Status: SUCCEEDED
Request Time: 2022-01-24T18:14:24+00:00
Faults: 0
Total Subtasks: 5
Subtasks Succeeded: 5
Subtasks Pending: 0
Subtasks Retrying: 0
Subtasks Failed: 0
Subtasks Canceled: 0
Subtasks Expired: 0
Subtasks with Skipped Errors: 0
Completion Time: 2022-01-24T18:14:58+00:00
Source Endpoint: ESnet Starlight DTN (Anonymous read only testing)
Source Endpoint ID: ece400da-0182-4777-91d6-27a1808f8371
Destination Endpoint: Globus Tutorial Collection 1
Destination Endpoint ID: 6c54cade-bde5-45c1-bdea-f4bd71dba2cc
Bytes Transferred: 2000
Bytes Per Second: 60
This task could be repeated in a shell script, cron job or even using the Globus timer functionality, for either a source or destination directory that is expected to change.
Most Globus Connect Server endpoints are configured to require some form of authentication & authorization process. In the case of the JASMIN Default Collection, you link your Globus identity to your JASMIN identity. This may be different for other collections that you use elsewhere.
Let’s find, then set up an alias to the JASMIN Default Collection Endpoint. We can search for that name:
globus endpoint search "jasmin default"
ID | Owner | Display Name
------------------------------------ | ------------------------------------------------------------ | -------------------------
a2f53b7f-1b4e-4dce-9b7c-349ae760fee0 | a77928d3-f601-40bb-b497-2a31092f8878@clients.auth.globus.org | JASMIN Default Collection
Set up an alias for this collection:
export jdc=a2f53b7f-1b4e-4dce-9b7c-349ae760fee0
If you’ve already interacted with this collection recently, you should find that you can list it with the CLI already. If not, you will be prompted to authenticate. Follow through all the steps until you complete the process, then return to the terminal session.
If successful, you can now interact with the JASMIN endpoint, for example listing your home directory:
globus ls $jdc:/~/
...
(file listing of your JASMIN home directory)
...
The authentication via your JASMIN account lasts for 30 days, so you can run and re-run transfers during that period without needing to repeat the process (hence without any human interaction, if you have scheduled/automated transfers, see below).
If this needs to be renewed, then:
the simple way to do this is to either:
In either case, if the authentication has timed out, you will be prompted to follow instructions to renew it, then the action (listing the directory) should complete successfully.
There are ways to do use a “refresh token” programatically to renew the authentication. Watch this space for details of how to do that (or f)
The functionality demonstrated above can be combined into scripts which can perform useful, repeatable tasks such as:
Globus provide 2 implementations of this here:
Examples of automation using the Globus CLI , specifically:
We have not covered the Python SDK here, but this is a useful example of how you could integrate Globus transfer functionality into your own code and workflows. You would need to install and authorise this SDK first.
Taking the first of these examples, we can adapt it slightly:
1. Select the JASMIN endpoint at the destination, and set the destination path. Modify the corresponding variables in the script to these values:
DESTINATION_COLLECTION='a2f53b7f-1b4e-4dce-9b7c-349ae760fee0' #JASMIN Default Collection ID
DESTINATION_PATH='/home/users/<username>/sync-demo/' #replace <username> with your JASMIN username
9efc947f-5212-4b5f-8c9d-47b93ae676b7
.
2. If you haven’t already, activate the Python virtual environment where you have the CLI installed, and login:
source ~/.globus-cli-venv/bin/activate
globus login
3. Check that you can interact with the JASMIN collection from the CLI, by trying to list it
Follow any instructions needed, if you need to renew your authentication.
4. Run the script to sync the data from the Globus Tutorial Endpoint to the destination directory.
You should see output similar to that shown below.
./cli-sync.sh
Checking for a previous transfer
Last transfer f5db7238-8f06-11ec-8fe0-dfc5b31adbac SUCCEEDED, continuing
Verified that source is a directory
Submitted sync from 6c54cade-bde5-45c1-bdea-f4bd71dba2cc:/share/godata/ to a2f53b7f-1b4e-4dce-9b7c-349ae760fee0:/~/sync-demo/
Link:
https://app.globus.org/activity/04e277f4-8f07-11ec-811e-493dd0cf73a1/overview
Saving sync transfer ID to last-transfer-id.txt
5. Check on the status of the task. You could do this by
globus task show <taskid>
6. You could then make some change to either source or destination directory, and simply re-run the script
./cli-sync.sh
7. Experiment by changing the SYNCTYPE
. Other options are:
See here for descriptions of the available sync levels :
EXISTS
SIZE
MTYPE
CHECKSUM
8. Automating repeats of the sync operation
You could then consider how to repeat the task automatically. For example:
cli-sync.sh
command according to some condition that’s met in your workflow.cli-sync.sh
command on your own machine using cron on your own machine.
virtualenv
as the main globus cli.