JASMIN Help Site logo JASMIN Help Site logo
  • Docs 
  • Guides 
  • Training 
  • Discussions   

  •   Search this site  

Can't find what you're looking for?

Try our Google custom search, across all JASMIN sites

Docs
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  1.   Data transfer
  1. Home
  2. Docs
  3. Data transfer
  4. Scheduling/Automating Transfers

Scheduling/Automating Transfers

 

Share via
JASMIN Help Site
Link copied to clipboard

Scheduling/Automating Transfers

On this page
Overview   Using Globus for transfer automation   Scheduling download tasks using cron and LOTUS   xfer-vm-03 - transfer machine with cron   Invoking LOTUS from cron to carry out multiple download tasks   1. Single download script   2. Multi-node downloads  

This article explains how to schedule or automate data transfers. It covers:

  • Using Globus for transfer automation
  • Scheduling download tasks using cron and LOTUS

Overview  

In many cases it can be useful to fetch data from an external source for processing/analysis on JASMIN on a regular basis, for example “every Monday at 11:00 fetch all last week’s data”. It can also be helpful to distribute the task downloading of some large datasets, or simply to be able rely on data being pulled in from some external source to an accumulating dataset used for periodic analysis.

Using Globus for transfer automation  

It is easy to automate transfers using Globus. This method has the advantage that you do not need to remain connected or logged in to any JASMIN server for the automated transfers to take place on your behalf, and the transfer itself can be more efficient than other methods described below.

Some introductory information about how to do this is available in this article Using the Globus command-line interface (with more to follow) but please also refer to the comprehensive Globus documentation and their automation examples  . You can choose whether to schedule/automate tasks via the Globus web interface  , command-line interface  , or use their Globus Python SDK  to build Python code that uses this functionality.

Scheduling download tasks using cron and LOTUS  

While the cron server cron-01.jasmin.ac.uk is provided for scheduling general tasks, it should not be used for the work of executing those tasks itself, and not for transfer tasks.

xfer-vm-03 - transfer machine with cron  

The transfer server xfer-vm-03.jasmin.ac.uk is also provided with cron, and should be used where a task is primarily a transfer rather than a processing task and needs the functionality of a transfer server. Please refer to the above cron guidance for best practice advice.

In particular, you must use crontamer to manage your cron jobs.

Invoking LOTUS from cron to carry out multiple download tasks  

Sometimes we need a task to be invoked from cron but executed where there are lots of nodes to parallelise the tasks (i.e. the LOTUS cluster). In this case, we DO need to use the cron server rather than the cron-equipped transfer server xfer-vm-03, since we need to be able to submit jobs to LOTUS (the transfer server can’t do that).

This will only work where the download can happen over HTTP(S), so depends on how the remote data is made available.

We need it to:

  • invoke a job submission script at regular intervals
  • have that script initiate downloads using LOTUS nodes

In the examples below, we use the test queue (or partition, as queues are known in Slurm). You can use this for testing, but once you know roughly how long your download(s) should take, you should choose an appropriate queue so that the jobs can be scheduled in a fair way alongside other users’ jobs.

1. Single download script  

The simple script below is used to download a single file from an external source via HTTP using wget. It initially uses the test partition (queue), but once you have tested it, you should use a more appropriate queue.

#!/bin/bash 
#SBATCH --partition=test
#SBATCH -o %j.out 
#SBATCH -e %j.err
#SBATCH --time=00:30

# executable 
wget -q -O 1MB_${SLURM_JOBID}.zip http://speedtest.tele2.net/1MB.zip

In this example, the file is labelled with the id of the job which downloaded it, e.g. 1MB_61117380.zip

The same could be also achieved using curl, or using a Python script making use of (for example) the requests library.

A note about transfer tools: since we are delegating the actual download task to a LOTUS node, we are restricted to transfer tools already installed on those nodes or available in the user’s path at a location cross-mounted with nodes in the LOTUS cluster (see Table 1 in Access to Storage), such as $HOME or a group workspace. It is not possible for the JASMIN team to install specialist data transfer tools across the whole cluster, so you may be limited to downloading via HTTP(S), FTP, or via tools available via libraries in the Python environment such (which you do have access to and can easily customise to install additional libraries using virtual environments).

Due to networking limitations, LOTUS nodes cannot perform downloads using SSH-based methods, i.e. scp/rsync/sftp.

Download tools installed on LOTUS nodes include:

  • wget
  • curl
  • ftp (but not lftp)

In our simple example above, we can subit this script to LOTUS from the command line with

sbatch test_download.sh

This could be invoked on a regular basis by adding a crontab entry like this

30 * * * * sbatch /home/users/username/test_download.sh

However it would be safer to wrap this in a crontamer command like this to ensure one instance of the task had finished before the next started: (see Using cron for details)

30 * * * * crontamer -t 2h 'sbatch /home/users/username/test_download.sh'

2. Multi-node downloads  

We could expand this example to download multiple items, perhaps 1 directory of data for each day of a month, and have 1 element of a job array handle the downloading of each day’s data.

A few words of warning: Distributing download tasks as shown below can cause unintended side-effects. Here, we’re submitting an array of 10 download jobs, each initiating a request for a 1MB file which may well happen simultaneously. So we need to be confident that the systems and networks at either end can cope with that. It would be all too easy to submit a task to download several thousand large data files and cause problems for other users of JASMIN (and other users on its host institution’s network), or indeed at the other end. Taken to the extreme, this could appear over the network as a Distributed Denial-of-Service (DDoS) attack. As with all LOTUS tasks: start small, test, and increase to sensible scales when you are confident it will not cause a problem. A limit of 10 jobs would be a sensible maximum, for one user.

We’ll simulate this here by downloading the same external file to 10 different output files, but you could adapt this concept for your own purposes depending on the layout of the source and destination data.

#!/bin/bash 
#SBATCH --partition=test
#SBATCH -o %A_%a.out
#SBATCH -e %A_%a.err
#SBATCH --time=00:30
#SBATCH --array=1-10
#SBATCH --time=00:30

# executable 
wget -q -O 1MB_${SLURM_ARRAY_TASK_ID}.zip http://speedtest.tele2.net/1MB.zip

echo "script completed"

In this (perhaps contrived) example, we’re setting up an array of 10 elements and using the SLURM_ARRAY_TASK_ID environment variable to name the output files (otherwise they’d all be the same). In a real-world example you could apply your own logic to divide up files or directories matching certain patterns to become elements of a job array.

The script could then be scheduled to be invoked at regular intervals as shown in (1).

Some tools provide functionality for mirroring or synchronising directories, i.e. only downloading those files in a directory which are new have been added since the last time a task was run. These can be useful to avoid repeated downloads of the same data.

Last updated on 2024-10-02 as part of:  updates oct 02 for r9 env (7db335bb5)
On this page:
Overview   Using Globus for transfer automation   Scheduling download tasks using cron and LOTUS   xfer-vm-03 - transfer machine with cron   Invoking LOTUS from cron to carry out multiple download tasks   1. Single download script   2. Multi-node downloads  
Follow us

Social media & development

   

Useful links

  • CEDA Archive 
  • CEDA Catalogue 
  • JASMIN 
  • JASMIN Accounts Portal 
  • JASMIN Projects Portal 
  • JASMIN Cloud Portal 
  • JASMIN Notebooks Service 
  • JASMIN Community Discussions 

Contact us

  • Helpdesk
UKRI/STFC logo
UKRI/NERC logo
NCAS logo
NCEO logo
Accessibility | Terms and Conditions | Privacy and Cookies
Copyright © 2025 Science and Technology Facilities Council.
Hinode theme for Hugo licensed under Creative Commons (CC BY-NC-SA 4.0).
JASMIN Help Site
Code copied to clipboard