Page tree

The details of our intake-esm software is here. Once the intake-esm is loaded in your environment you can start to use our intake-esm data catalogs.  NCI current provides intake-esm data catalogs for the following CMIP5/CMIP6 data collections on NCI:

Our intake-esm catalogue files are all located on the filesystem under /g/data/dk92/catalog/v2/esm.  Note that you must have connected to project dk92 to access these.

Operations

First of all, you need to open a catalog file via the intake open_esm_datastore method.

Open Catalogue File
import intake
cmip6 = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-oi10/catalog.json")

Calling the loaded esm_datastore, gives an overview over its content.

Get catalogue head
cmip6

The datastore contains a df class in the type pf pandas DataFrame.

Get catalogue head
cmip6.df.head()

Using `cmip6.df.columns` lists all the columns/keys that can be used to search the data.

Get all columns
cmip6.df.columns

The method unique() lists all the unique values for each column as a dictionary. You can search any values for each column.

List unique keys per column
values_dict = cmip6.unique()
print(values_dict)



Let's select a subset by passing the search() method with a combination of columns. The returned results shows that the subset contains 18 files crossing multiple columns.

Search keywords
subset = cmip6.search(source_id=['MPI-ESM-1-2-HAM', 'NorESM2-LM'],
    experiment_id=['ssp370-lowNTCF'],
    variable_id="tas",
    table_id="Amon",
    grid_label="gn")
subset


Now you can open the dataset directly via the to_dataset_dict() API. It is recommended to start a Dask cluster to accelerate it.

For example, you can quickly set up a local Dask cluster with a single node resources as below. 

Start Dask cluster
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)

Now you can invoke the to_dataset_dict() API and it returns a dictionary listing all the datasets in our subset

Print dataset metadata
dset_dict = subset.to_dataset_dict()
print(dset_dict)



Finally you can simply load a dataset using its key

Access the dataset
ds = dset_dict['AerChemMIP.ssp370-lowNTCF.gn.HAMMOZ-Consortium.na.CMIP6.MPI-ESM-1-2-HAM.Amon.tas.na']
ds





  

  • No labels