CEDA Archive

This article describes accessing the CEDA Archive from JASMIN: 

Overview

The CEDA Archive provides direct access to thousands of atmospheric, climate change and earth observation datasets. The Archive is directly accessible as a file system from the shared science machines on JASMIN. 

It is a separate service run by the CEDA team - it is not a JASMIN service. Therefore, many of the links in this document will take you to the CEDA Archive help documentation site (as the information relates to CEDA Archive services). This is separate from the JASMIN help documentation site (which is specifically about JASMIN services). 

Register for a CEDA Account

First, you need a CEDA Archive account. If you do not have a CEDA account, please follow the steps in this CEDA help document to register as a new CEDA user. It also explains how you can reset your password if you have forgotten it. When you have made a CEDA account, you will then need to link it to your JASMIN account

The JASMIN Account Portal deals with the management of access to JASMIN resources (e.g. compute and storage), whereas MyCEDA (the CEDA Accounts Portal) deals with access to CEDA resources (e.g. access to datasets in the archives). You will need both accounts linked in order to access CEDA Archive data from JASMIN - you can check whether your accounts are linked from within the JASMIN Accounts Portal

Accessing the CEDA Archive on JASMIN servers

Once you have linked your CEDA and JASMIN accounts, you will have access to large parts of the archive straightaway. 

The contents of the CEDA Archive are available on the file system under  /badc and /neodc. Note: do not access data via any symlinks that point to /datacentre/archvol* - these are not permanent links and may change when data are migrated to new storage. Please use the archive path names under /badc and /neodc. Search the CEDA data catalogue for further details about data held in the archive.

Note: badc is for atmospheric data, neodc is for earth observation data - they are named after CEDA's previous archive names (British Atmospheric Data Centre, and the NERC Earth Observation Data Centre).

Most data on the Archive is open access - however, some datasets are restricted. You can work this out by looking at the UNIX access groups the data are within (see below). If your required datasets are restricted, access to these can be obtained by applying for specific access via the data centre (see this article for more details). If direct access is not possible the data can be obtained via standard FTP and web-based access methods to the CEDA Archive and transferred to a suitable group workspace on JASMIN. As the data centres use the same JASMIN infrastructure the transfer rates are high. 

The CEDA Data Catalogue is a useful tool to find and apply for access to datasets. 

Archive access groups

The UNIX access groups used within the CEDA Archive are listed below with links to example datasets in the CEDA data catalogue for those wishing to use them:

  • open - Available to any logged in JASMIN user with a linked CEDA user account. See a full list of available datasets here.
  • cmip5_research- restricted CMIP3 and CMIP5 datasets
  • esacat1- Satellite data including MERIS, MIPAS and SCIAMACHY.
  • ecmwf- Access to the ECMWF Operational Datasets.
  • eurosat- Satellite data including IASI, AVHRR-3 and GOME-2.
  • ukmo_wx - Met Office observational dataset collections including LIDARNET, MIDAS, MetDB and NIMROD
  • ukmo_clim- Climatology datasets from the Met Office, including Central England Temperature dataset collection, HadISST.
  • byacl- These data have specific restrictions on them meaning that they can't be accessed directly from JASMIN, but can be obtained via FTP and web access.

Data Licencing

All use of data accessed directly from the CEDA Archive must be used in line with the relevant data licence in place for the relevant dataset for the purposes stated in the access application. Data licence information can be found on the relevant CEDA Data Catalogue page, a link to which can be found in the 00README_catalogue_and_licence.txt files found in the archive. For specific data licences granted for restricted datasets, users should log into their MyCEDA page to view their granted licence and the associated usage purpose under which access was granted. Any required alternative use of the data beyond the original purpose stated in the original licence application can only be made with a freshly granted new licence application.

Accessing data in the archive

In the example below, the logged-in user is listing the contents of the CRU data sets within the BADC archive. These are "open" so all logged-in users can access them:

$ ls -l /badc/cru/data
total 320
-rw-r-----  1 badc open  396 Feb 18  2015 00README
drwxr-x---  8 badc open 4096 Mar 22 10:32 cru_cy
drwxr-x---  4 badc open 4096 Dec  6  2014 crutem
drwxr-x--- 12 badc open 4096 May  9 14:11 cru_ts
drwxr-x---  3 badc open 4096 Feb 18  2015 PDSI