Specialised Environments

Page tree

The details of our intake-esm software is here. Once the intake-esm is loaded in your environment you can start to use our intake-esm data catalogs.  NCI current provides intake-esm data catalogs for the following CMIP5/CMIP6 data collections on NCI:

Our intake-esm catalogue files are all located on the filesystem under /g/data/dk92/catalog/v2/esm.  Note that you must have connected to project dk92 to access these.

Operations

First of all, you need to open a catalog file via the intake open_esm_datastore method.

Open Catalogue File
import intake
cmip6 = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-oi10/catalog.json")

Calling the loaded esm_datastore, gives an overview over its content.

Get catalogue head
cmip6

The datastore contains a df class in the type pf pandas DataFrame.

Get catalogue head
cmip6.df.head()

Using `cmip6.df.columns` lists all the columns/keys that can be used to search the data.

Get all columns
cmip6.df.columns

The method unique() lists all the unique values for each column as a dictionary. You can search any values for each column.

List unique keys per column
values_dict = cmip6.unique()
print(values_dict)



Let's select a subset by passing the search() method with a combination of columns. The returned results shows that the subset contains 18 files crossing multiple columns.

Search keywords
subset = cmip6.search(source_id=['MPI-ESM-1-2-HAM', 'NorESM2-LM'],
    experiment_id=['ssp370-lowNTCF'],
    variable_id="tas",
    table_id="Amon",
    grid_label="gn")
subset


 The search results are often split into multiple keys based on metadata columns, and each key represents a unique combination of metadata attributes. This allows you to distinguish between different datasets that match your query.  

You can list all keys as below

list keys
subset.keys()

It contains the following keys.

list keys
['f.AerChemMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.ssp370-lowNTCF.r1i1p1f1.mon.atmos.Amon.tas.gn.v20190627',
 'f.AerChemMIP.NCC.NorESM2-LM.ssp370-lowNTCF.r1i1p1f1.mon.atmos.Amon.tas.gn.v20200206',
 'f.AerChemMIP.NCC.NorESM2-LM.ssp370-lowNTCF.r2i1p1f1.mon.atmos.Amon.tas.gn.v20200206',
 'f.AerChemMIP.NCC.NorESM2-LM.ssp370-lowNTCF.r3i1p1f1.mon.atmos.Amon.tas.gn.v20200206']

Now you can open the dataset directly via the to_dataset_dict() API. It is recommended to start a Dask cluster to accelerate it.

For example, you can quickly set up a local Dask cluster with a single node resources as below. 

Start Dask cluster
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)

Now you can invoke the to_dataset_dict() API and it returns a dictionary listing all the datasets in the dataset, leading by each key

Print dataset metadata
dset_dict = subset.to_dataset_dict()
print(dset_dict)



Finally you can simply load a dataset using its key

Access the dataset
ds = dset_dict['f.AerChemMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.ssp370-lowNTCF.r1i1p1f1.mon.atmos.Amon.tas.gn.v20190627']
ds





  

  • No labels