Page tree

1. Prerequisites

Your NCI account should be connected to NCI project "hh5" to obtain the dependent Conda modules

2. Starting a JupyterLab Session

Start a JupyterLab session and follow the details from our User Guide on JupyterLab on the OOD but using the hh5 modules area.

  1. Choose appropriate compute resources. 
  2. Click "advanced options" and
    1. Type in "/g/data/hh5/public/modules" in "Module directories" box.
    2. Type in "conda/analysis3-unstable" in "Modules" box.
  3. Click "Launch" button to start up a JupyterLab session.

Go to the working directory containing COSIMA notebooks and open a Jupyter notebook.

If you need, set up a Dask cluster with appropriate resources. For example, 16 cores with 4 cores per node and 4 nodes are setup as follows:

SlurmCluster
from dask.distributed import Client,Scheduler
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=4,memory="47GB")
client = Client(cluster)
cluster.scale(cores=16)

Wait until the "client" command gives the following information:


You can monitor the Dask cluster activities by following the instructions for the Dask-JupyterLab extension.

Continue executing the Jupyter notebook as usual.

Notes on Utilising a Dask Cluster

You can parallelize the process of reading dataset files.  For example, some notebooks use the Xarray "open_mfdataset" function - e.g. cosima-recipes/ContributedExamples/Ice Diagnostics.ipynb


Jupyter Notebook
dsx = xr.open_mfdataset(dataFileList[:12], decode_times=False, concat_dim='time')


This can be parallelised it by adding the flag "parallel=True", as follows:

Jupyter Notebook
dsx dsx = xr.open_mfdataset(dataFileList[:12], decode_times=False, concat_dim='time',parallel=True)

For further tuning on job performance and memory utilisation, please refer to Using dask and xarray (superseded).




  • No labels