Data Access Information

CMIP data located at the National Computational Infrastructure (NCI) covers a broad range of datasets, including CMIP5 and CMIP6 era replicated and published data as well as key observational and reanalysis datasets. For further information on the different climate datasets and available variables see the Datasets and Available Variables page. On this page we describe the methods in which the CMIP data hosted at NCI may be accessed. Sections in this page are broken into the following topics:

Access Options

Please note: External users who do not have an NCI user account are directed to option 2 below. Alternatively you can find out more about how to access NCI.

Access to the CMIP and related data available is available via three key methods:

Access on the NCI filesystem
1. Users with an NCI login can directly access the data locally on the NCI filesystem with either login to HPC Gadi or the ARE (VDI or Jupyterlab), and using our data analysis environments.
2. Users are encouraged to use intake-ESM - a python API to ease searching the CMIP datasets on NCI.
The ESGF data portal
1. https://esgf.nci.org.au/projects/esgf_nci/
2. The Earth System Grid Federation (ESGF) data portal provides access to CMIP datasets published by NCI and hosted across the international data nodes. If you cannot use the data on gadi, you may search for the CMIP data from the ESGF site. Note, not all replica data is currently published on NCI ESGF site.
The NCI data catalogue

You may search for the CMIP data published at NCI via the Geonetwork metadata catalog (which also provides links to the data via THREDDS)

- Our CMIP6 data: https://dx.doi.org/10.25914/5b98afc88531e (including the Australian data fs38, and the replicated data oi10)
- Our CMIP5 data: https://dx.doi.org/10.4225/41/5a700d0f3f5b0 (including the Australian data rr3, and the replicated data al33)
- Our CMIP3 data: cb20

Direct Data Access

Primary access to the CMIP data, from both Gadi and the ARE (VDI or Jupyterlab), and using our data analysis environments.. This is achieved via requesting to join the relevant project space (see below) and accessing the data from /g/data/<project code> (see table below). Note: to access through this approach, you will need an NCI account and a computational project through one of the NCI schemes for access.

Australian Published CMIP Data

Replicated CMIP Data

Replicated Observational and Reanalysis Data

NCI Project Code

fs38 = CMIP6-era

rr3 = CMIP5-era (incl. CORDEX)

oi10 = CMIP6-era

al33 = CMIP5-era (incl. CORDEX)

cb20 = CMIP3

qv56^! = input4MIPs, obs4MIPs, ana4MIPS

Please note: New version updates for qv56 datasets will not be actioned for the remainder of 2020. Please contact help@nci.org.au (Subject: Data Collections) if you have any questions about these datasets.

Request to Access a Data Collection

You may request to join a data collection through my.nci.org.au/mancini. Your request will be sent to the data collection manager for approval and you must agree to the same Terms and Conditions that govern CMIP access data access as stipulated by the Earth Systems Grid Federation:

CMIP6 Terms of Use

CMIP5 Terms of Use

Intake and Intake-ESM

The volume and complexity of CMIP makes manual searching for data on the filesystem a time consuming process. The datasets have been indexed using intake, which allows both a intake-spark scheme containing all the metadata for the files, or a more specific intake-ESM suited for climate scientists. We have also provided more details about using intake-ESM for the CMIP datasets. To use all the datasets, you must ask for membership of the NCI CMIP data collections through mancini described above.

Previously, we used the the Climate Finder tool (CleF) indexing tool, developed by ARCCSS and CLEX. This has been superseded by our use of intake.

Data Organisation

To permit ease of use and interdisciplinary research, CMIP uses a standard naming convention for files, directories, metadata and URLs. These conventions are described in detail in the CMIP6 Controlled Vocabularies and CMIP5 Controlled Vocabulary

CMIP6 Published Data

Under the CMIP6 DRS, data may be found with the following directory format:

/g/data/fs38/publications/CMIP6/CMIP/<institution_id>/<source_id>/<experiment_id>/<member_id>/<table_id>/<variable>/<grid_label>/<version>

CMIP6 Official Replica Data

Under the CMIP6 DRS, data may be found with the following directory format:

/g/data/oi10/replicas/CMIP6/CMIP/<institution_id>/<source_id>/<experiment_id>/<member_id>/<table_id>/<variable>/<grid_label>/<version>

CMIP5 and CORDEX Australian Published Data

The CMIP5 and CORDEX published data can be found with the following directory structure:

/g/data/rr3/publications/CMIP5/output1/<institute>/<model>/<experiment>/<frequency>/<realm>/<table>/<ensemble>/<version>/<variable>

CMIP5 Official Replica Data

The CMIP5 official replica data can be found with the following directory structure:

/g/data/al33/replicas/CMIP5/combined/<institute>/<model>/<experiment>/<frequency>/<realm>/<table>/<ensemble>/<version>/<variable>

CMIP3 Official Replica Data

The CMIP3 official replica data can be found with the following directory structure:

/g/data/cb20/replicas/cmip3/<institute>/<model>/<experiment>/<frequency>/<realm>/<ensemble>/<variable>

Note some of the key difference in the facets between CMIP5 and CMIP6, in particular "model" in CMIP5 has become "source_id" and "ensemble" has become "member_id". Other facets have similar names, though in CMIP6 the convention often includes an "_id".

Definitions for directory format terms:

institute/institution_id: the institution that produced the model output (e.g. CSIRO-BOM, UNSW, etc.)
model/source_id: also refereed as source_id in CMIP6, it is the CMIP model name identifier.
experiment/experiment_id: the CMIP experiment identifier (e.g., historical, piControl, rcp45, etc.)
frequency: frequency identifier (e.g., 3hr, day, mon, etc.)
realm: modelling realm (e.g., atmos, land, ocean, etc.)
ensemble/member_id: ensemble or member_id, provides information on initialisation and physics identifier (e.g., r1i1p1, r1i1p2, etc.)
variable/variable_id: output variable (see full list here)
version: available versions, and 'latest' with symbolic links to the latest available version, where a version isn't available the creation date is used instead.
table/table_id: For example "Amon" is short for atmosphere monthly, "Omon" is ocean monthly, etc.

Guidance for Data Users

The following guidance material for CMIP6 data users is provided through PCMDI. It includes information on experiment design, model output, terms of use and citation, model documentation, error reporting, registering publications which use CMIP6 data, and CMIP6 governance.

https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html

For more information on the CMIP and CORDEX data, including data formats and processing see ESGF User Support - Data FAQs.

Page tree