JASMIN Help Site logo JASMIN Help Site logo
  • Docs 
  • Guides 
  • Training 
  • Discussions   

  •   Search this site  

Can't find what you're looking for?

Try our Google custom search, across all JASMIN sites

Docs
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  • getting started
    • get started with jasmin
    • generate ssh key pair
    • get jasmin portal account
    • get login account
    • beginners training workshop
    • how to contact us about jasmin issues
    • jasmin status
    • jasmin training accounts
    • tips for new users
    • how to login
    • multiple account types
    • present ssh key
    • reconfirm email address
    • reset jasmin account password
    • ssh auth
    • storage
    • understanding new jasmin storage
    • update a jasmin account
  • interactive computing
    • interactive computing overview
    • check network details
    • login servers
    • login problems
    • graphical linux desktop access using nx
    • sci servers
    • tenancy sci analysis vms
    • transfer servers
    • jasmin notebooks service
    • jasmin notebooks service with gpus
    • creating a virtual environment in the notebooks service
    • project specific servers
    • dask gateway
    • access from vscode
  • batch computing
    • lotus overview
    • slurm scheduler overview
    • slurm queues
    • lotus cluster specification
    • how to monitor slurm jobs
    • how to submit a job
    • how to submit an mpi parallel job
    • example job 2 calc md5s
    • orchid gpu cluster
    • slurm status
    • slurm quick reference
  • software on jasmin
    • software overview
    • quickstart software envs
    • python virtual environments
    • additional software
    • community software esmvaltool
    • community software checksit
    • compiling and linking
    • conda environments and python virtual environments
    • conda removal
    • creating and using miniforge environments
    • idl
    • jasmin sci software environment
    • jasmin software faqs
    • jaspy envs
    • matplotlib
    • nag library
    • name dispersion model
    • geocat replaces ncl
    • postgres databases on request
    • running python on jasmin
    • running r on jasmin
    • rocky9 migration 2024
    • share software envs
  • data transfer
    • data transfer overview
    • data transfer tools
    • globus transfers with jasmin
    • bbcp
    • ftp and lftp
    • globus command line interface
    • globus connect personal
    • gridftp ssh auth
    • rclone
    • rsync scp sftp
    • scheduling automating transfers
    • transfers from archer2
  • short term project storage
    • apply for access to a gws
    • elastic tape command line interface hints
    • faqs storage
    • gws etiquette
    • gws scanner ui
    • gws scanner
    • gws alert system
    • install xfc client
    • xfc
    • introduction to group workspaces
    • jdma
    • managing a gws
    • secondary copy using elastic tape
    • share gws data on jasmin
    • share gws data via http
    • using the jasmin object store
    • configuring cors for object storage
  • long term archive storage
    • ceda archive
  • mass
    • external access to mass faq
    • how to apply for mass access
    • moose the mass client user guide
    • setting up your jasmin account for access to mass
  • for cloud tenants
    • introduction to the jasmin cloud
    • jasmin cloud portal
    • cluster as a service
    • cluster as a service kubernetes
    • cluster as a service identity manager
    • cluster as a service slurm
    • cluster as a service pangeo
    • cluster as a service shared storage
    • adding and removing ssh keys from an external cloud vm
    • provisioning tenancy sci vm managed cloud
    • sysadmin guidance external cloud
    • best practice
  • workflow management
    • rose cylc on jasmin
    • using cron
  • uncategorized
    • mobaxterm
    • requesting resources
    • processing requests for resources
    • acknowledging jasmin
    • approving requests for access
    • working with many linux groups
    • jasmin conditions of use
  1.   Short term project storage
  1. Home
  2. Docs
  3. Short term project storage
  4. Secondary copy using Elastic Tape

Secondary copy using Elastic Tape

 

Share via
JASMIN Help Site
Link copied to clipboard

Secondary copy using Elastic Tape

On this page
Introduction   Who can use ET?   How does it work?   What should I do next?   System overview   Configuration file   User interface   et_put.py   et_get.py   et_rm.py   et_ls.py   Alerts  
 
  • Information below relates to the Elastic Tape command-line tools. The JDMA system provides a better interface for putting/retrieving data into the Elastic Tape System)
  • A new system called NLDS  is coming very shortly (as of Feb 2023) and will eventually replace both of these.

Introduction  

Elastic Tape is a system developed for use with JASMIN Group Workspaces (GWSs), enabling the Group Workspace Manager to:

  • Optimise their use of high-performance online disk by moving data to and from cheaper near-line storage
  • Create and manage secondary copies of GWS data

At present, the system is designed only to be used by GWS Managers, rather than individual members of a GWS. It is the responsibility of a GWS Manager to create and manage backups or additional copies of data in a GWS.

The servers used to access Elastic Tape changed in January 2021. Previous users should note that the server to use now is et.jasmin.ac.uk.

The maximum size for any file put into Elastic Tape is 500GB. This changed in 2023, when the underlying tape system was upgraded. Please limit your files to less than 500GB.

Who can use ET?  

ET is only for use by the named GWS manager, i.e. the individual responsible for managing the GWS disk space. The high-performance disk space used for a GWS is a valuable commodity and the role of the GWS Manager involves making best use of the online space. This may mean moving data to tape to free up space online, or taking a copy of online data to make a secondary copy. No undertaking is provided that the secondary copy will exist beyond the lifetime of the Group Workspace itself, hence it is called a secondary copy and not a backup. It is also NOT long-term archive storage: some data in GWSs may need to be earmarked for longer-term archive storage and wider availability via the CEDA Archive, but this is a separate process for which data management plans, ingest processes and metadata need to be put in place. Please contact the helpdesk if this is the case, but ideally this needs to be considered at project design phase (as it may need funding!).

Each GWS has a quota of online disk space (agreed at the time of its creation) and initially the ET quota has been set to the same value. So if you have a 10 Tb workspace, you initially have a 10Tb quota of ET storage to match.

How does it work?  

Putting data into ET storage involves creating a “batch” of data which is transferred to the ET system. Using either a file list or top-level directory for reference, the system calculates resources needed and creates a batch, identified by a batch ID. This must be retained by the GWS manager as a “ticket” for later retrieval of this batch of data. It is recommended that you assess the data that you plan to transfer so that you have an idea of the overall volume to be transferred before initiating any actual transfer jobs. It is also recommended to test operation with a small set of test data.

Transparent to the user, and asynchronously (so it is not necessary to wait with a terminal window open), the data are transferred first to online cache and then to tape storage. It is not an instant process and the task of migrating data from online cache to tape can take several hours, even days, depending on factors such as the size of the transfer, contention for the tape system and network conditions. An RSS feed and a web page provide updates on the process of data transfer for each batch. Data can later be retrieved, or removed from ET storage via similar tools.

The transfer of data via a batch involves the “registration” of each file in a database so that its existence is recorded.

Command line tools are provided on a dedicated machine within the JASMIN infrastructure, to which GWS managers will be given access. A GWS manager has access to the python tools et_put.py, et_get.py, et_rm.py and et_ls.py. Some initial documentation for these command line tools is attached.

What should I do next?  

It is recommended to try sending and retrieving some small data transfers (a few Gb) at first using the documentation below, but the system has been designed to cope with storing entire GWSs. You will need ssh login access to et.jasmin.ac.uk first. This should have been arranged for you as part of the GWS setup process. If not, please contact the JASMIN helpdesk. Once there, you should be able to see your group workspace and try out the commands on a small set of test data.

System overview  

Elastic Tape provides the ability to create “batches” of files which are then sent to the storage system, initially to an online disk cache before being written to near-line tape. Batches can later be retrieved, or removed. An alert system provides the user with the ability to monitor the progress of transfer jobs.

The system comprises:

  • A command-line interface on a client machine
  • A backend system, consisting of
    • I/O servers connected to an online disk cache and database
    • A near-line tape system

Configuration file  

As a GWS manager, you will normally be responsible for one or more GWSs. The GWS with which you wish to work using ET needs to be specified either in a configuration file in your home directory, or by specifying the workspace as an option in the command line interface.

Certain default settings are set in a system-wide config file at /etc/et_config.

If needed, you need to create a small text file in your home directory named .et_config, which contains the following, replacing “myworkspace” with the name of your default workspace:

[Batch]
Workspace = myworkspace

myworkspace should just be the short name of the workspace, not the full path to it.

The workspace specified in any command-line option overrides that specified in the user’s (~/.et_config) config file, which in turn overrides that specified in the system (/etc/et_config) config file.

Please REMOVE any previous reference to host and port from your individual ~/.et_config file. This setting is now set from the system /etc/et_config file.

Further configuration options are available in the [DIRECTORY] section of the file, see the system-wide file /etc/et_config for examples. The main parameter for which you may wish to override the default is:

outputLevel = workspace|batch|file

although these can be over-ridden at the command line anyway. See et_ls.py command documentation below for the meaning of these options.

User interface  

Please note that NOT ALL features of the currently-implemented user interface are described here, however we would recommend that users limit their usage to those features described below.

The user interface consists of the following components:

  • et_put.py Put data onto tape
  • et_get.py Retrieve data from tape
  • et_rm.py Remove data from tape
  • et_ls.py List data holdings on tape
  • Alerts Get information about processes and holdings via web interface

The commands are available on host et.jasmin.ac.uk. As a GWS manager you should have been granted login access to this machine using your JASMIN account, however if accessing the host from outside the RAL network you will need to use one of the login gateways login*.jasmin.ac.uk. Use the -A option or equivalent for agent forwarding in ssh. STFC users may use the STFC VPN to connect to et.jasmin.ac.uk as if it were a local connection.

 

When writing data to the ET system, it is very important that data remains in place on disk, in the location where ET expects to find them, until the status of the batch in question has reached CACHED_SYNCED or SYNCED. This means that the data have actually been written to tape, but is not the case until that status is shown.

The location where ET expects to find the files will be specified in the LISTFILE that the user supplied to the et_put.py command, or all files and directories under the DIR. The status of user’s batches can be checked by going to the webpage: http://et-monitor.fds.rl.ac.uk/et_user/ET_AlertWatch.php  . You need to be logged into JASMIN to see this webpage, via the nx-login servers, and use Firefox as the web browser.

Deleting the data from disk prematurely can cause problems for the ET system as a whole (impacting other users) so please be careful with this aspect.


et_put.py  

Put data onto tape.

Synopsis  

et_put.py [-v] [-l LOGFILE] [-w WORKSPACE] [-c] [-t one-word-tag] [ -f LISTFILE | DIR ]

Description  

Data files to be stored can be specified either in an input list file (-f) or by specifying the path to the top of a directory tree containing files to be stored. All symbolic links are ignored (see note below). In both cases, the system will analyse the request and create a batch , identified by a BATCH ID, which can later be used to retrieve that set of files from storage. Although the main “put” operation is asynchronous (and does not require you to maintain a terminal connection for its duration), the initial registration step, which creates the BATCH ID is synchronous, so you should wait for this step to complete before disconnecting.

Given current resources, all users of Elastic Tape share the current throughput capacity of about 25 TB/day, which may increase over time. Please consider this when organising your input batches and expectations of completion time. Large numbers of small files will degrade performance.

Options  

option details
-v Verbose output
-l LOGFILE Log file in which to record process output
-f LISTFILE Text file containing ABSOLUTE paths of files to be stored, 1 file per line. NB Files and directory names are case-sensitive. The list should not contain any blank lines or extraneous white space.
-w WORKSPACE Name of the group workspace to use. Overrides default set in config file. Case sensitive.
DIR ABSOLUTE path to top of directory tree containing files to be stored
-c Continue if errors encountered.
-t tag Tag batch with descriptive label meaningful to user. Should be single one-word string. Appears as “Batch name” in ET alert output and “Tag” in et_ls output.

Example usage  

Simple case, using a file input.list which contains paths of all the files to be included in the batch:

et_put.py -v -l et_put.log -f input.list -w myworkspace

In the following example, the -coption is used to continue on errors. One error that may be encountered is that a file already exists in the system (e.g. has already been “put”). This option causes the system to ignore any errors and continue with the transfer. Note that this should not be used by default (we would rather know about errors and fix them!)

et_put.py -v -l et_put.log -f input.list -w myworkspace -c

Alternative usage specifying a directory beneath which all files / directories will be included. In this case the directory must be the last parameter in the command:

et_put.py -v -l et_put.log -w myworkspace /group_workspaces/jasmin/myworkspace/mydir

Symbolic links: Attempting to include symbolic links in an et_put operation should cause an error. You can override this with the -c option (although this will ignore ALL errors), but a better solution is to generate a list file as in the first two examples above. If this list file is generated with a command like find <path> -type f > listfile.txt, then referring to it in the et_put command will ensure that only those files are included in the batch. You can then keep the list file (perhaps named as per the resulting batch ID for your own records.


et_get.py  

Retrieve data from tape

Synopsis  

et_get.py [-v] [-l LOGFILE] [-b BATCHID | -f FILELIST] [-w WORKSPACE] [-r DIR] [-t MAXPROC]

Description  

Data files to be retrieved should be specified by referring to the batch ID of the batch in which they were stored. If files have been stored by specifying an absolute path e.g. /group_workspaces/jasmin/myworkspace/mydir, the retrieval process will not write the retrieved files to the same location but a new location specified by DIR. The final part of the relative path needs to correspond with the first part of the absolute path of the stored files, e.g. group_workspaces

Proposed best-practice is to create a temporary directory for retrieved data within your workspace, e.g. /group_workspaces/jasmin/myworkspace/ettmp and to do the initial retrieval into that directory. Once you are satisfied that the retrieval has completed correctly, data can be moved back to their original location in the workspace. NB if you need additional storage space for this, please see requesting resources.

Options  

option details
-v Verbose output
-l LOGFILE Log file in which to record process output. Note that the log file location must be capable of accepting multi-threaded input, or this parameter should be omitted and instead the output from the et_get command be piped to the log file from stdout
-b BATCHID ID of batch to be retrieved
-f FILELIST A list of individual files to be retrieved, with one file per line. Note that:
- entries in the list must contain the full name of the file, including path, just as it was archived
- the list should not contain blank lines or any extraneous white space.
-w WORKSPACE name of the group workspace to use. Overrides default set in config file. Case sensitive.
-r DIR ABSOLUTE path of retrieval location
-t MAXPROC Maximum number of worker processes to use in retrieval. MAXPROC recommended to be between 5 and 10. Please feed back your experience of performance improvement with this feature.

Example usage  

cd /group_workspaces/jasmin/myworkspace
mkdir ettmp
et_get.py -v -l et_get.log -w myworkspace -b 507 -r /group_workspaces/jasmin/myworkspace/ettmp

At this point, data will be transferred into the specified retrieval directory. Files and directories will be restored with their ABSOLUTE path below the retrieval directory. NB this is a synchronous process and you will need to keep your terminal window open to ensure it completes (or run within the screen command if you are familiar with this).

When the retrieval process has finished, you should satisfy yourself that it is correct (using your preferred method). When this is the case, you can move the data to the required location as shown below:

mv /group_workspaces/jasmin/myworkspace/ettmp/group_workspaces/jasmin/myworkspace/* /group_workspaces/jasmin/myworkspace

et_rm.py  

Remove data from tape

Synopsis  

et_rm.py [-v] -b BATCHID [-w WORKSPACE]

Description  

Deletes the files in the specified batch from the Elastic Tape system.

Options  

option details
-v Verbose output
-b BATCHID ID of batch to be removed
-w WORKSPACE name of the group workspace to use. Overrides default set in config file. Case sensitive.

Example usage:

et_rm.py -v -b 507

et_ls.py  

List holdings on tape

Synopsis  

et_ls.py [-h] [-X XMLSOURCE] [-H] [-b BATCHID] [-w WORKSPACE] [-L {file,batch,workspace}] [-F {text}]

Description  

Lists the holdings of a workspace within Elastic Tape at the file, batch or workspace level.

Options  

option details
-h, –help show this help message and exit
-x XMLSOURCE –xmlsource XMLSOURCE Base XML source, if not default. Note that this has to be compatible with the current base source currently, so can’t be pointed at files, for example
-H –headerWanted Print headers showing column names for text output
-b BATCHID –batchid BATCHID ID of batch by which to filter results
-w WORKSPACE Name of the group workspace to use. Overrides default set in config file. Case sensitive.
-L {file, batch, workspace} –outputLevel {file, batch, workspace} Level of detail to display for results (default is “workspace”)
-F {text} –outputFormat {text} Format to use for the display of results

Example usage:

et_ls.py -w myworkspace -H -L file -b 504

Works with the workspace “myworkspace”, selects display of headers in output, results at file level, filter by batchid 504 (i.e. shows the files present in ET in the given batch.)

et_ls.py -w myworkspace -H -L batch

Works with the workspace “myworkspace”, selects display of headers in output, results at batch level (i.e. shows the batches present in ET holdings for this workspace.)

Alerts  

The system provides real-time status messages on the progress of operations requested. These services are now available only inside the RAL firewall , so JASMIN users outside of RAL should use the NX graphical desktop service to open a firefox browser on one of the nx-login servers, to access these URLs

Alerts Dashboard http://et-monitor.fds.rl.ac.uk/et_user/ET_AlertWatch.php 

RSS Feed http://et-monitor.fds.rl.ac.uk/et_rss/ET_RSS_AlertWatch_atom.php 

In both cases these can be customised to display only alerts from the workspace of interest to the GWS manager.

Alerts Dashboard http://et-monitor.fds.rl.ac.uk/et_user/ET_AlertWatch.php?workspace=WORKSPACE 

RSS Feed http://et-monitor.fds.rl.ac.uk/et_rss/ET_RSS_AlertWatch_atom.php?workspace=WORKSPACE 

(replace WORKSPACE with your workspace name in the above URLs)

Further views

ET Home http://et-monitor.fds.rl.ac.uk/et_user/ET_Home.php?caller=USERNAME 

Holdings summary http://et-monitor.fds.rl.ac.uk/et_user/ET_Holdings_Summary.php?caller=USERNAME&workspace=WORKSPACE 

(replace USERNAME with your username, WORKSPACE with your workspace name in the above URLs)

Last updated on 2024-10-01 as part of:  updates oct 01 for r9 env (cfbd4f95b)
On this page:
Introduction   Who can use ET?   How does it work?   What should I do next?   System overview   Configuration file   User interface   et_put.py   et_get.py   et_rm.py   et_ls.py   Alerts  
Follow us

Social media & development

   

Useful links

  • CEDA Archive 
  • CEDA Catalogue 
  • JASMIN 
  • JASMIN Accounts Portal 
  • JASMIN Projects Portal 
  • JASMIN Cloud Portal 
  • JASMIN Notebooks Service 
  • JASMIN Community Discussions 

Contact us

  • Helpdesk
UKRI/STFC logo
UKRI/NERC logo
NCAS logo
NCEO logo
Accessibility | Terms and Conditions | Privacy and Cookies
Copyright © 2025 Science and Technology Facilities Council.
Hinode theme for Hugo licensed under Creative Commons (CC BY-NC-SA 4.0).
JASMIN Help Site
Code copied to clipboard