View Source

At the beginning of the notebook

To utilise the Dask server established from the PBS job, it is necessary to add and run the following cell at the beginning of your notebook:

from dask.distributed import Client,LocalCluster
client = Client(scheduler_file='scheduler.json')
print(client)

The output will show the configuration of the client and Dask cluster. You can check that the number of cores matches what you requested in the job script.

In order to adaptively manage your compute resources, it is possible to submit PBS jobs to set up a Dask cluster as needed inside of your notebooks. This scales up your program and you can submit as many jobs as required in your workflow:

import dask.config
from dask.distributed import Client,LocalCluster
from dask_jobqueue import PBSCluster
walltime = '01:00:00'
cores = 96
memory = '160GB'

cluster = PBSCluster(walltime=str(walltime), cores=cores, memory=str(memory),
                     job_extra=['-P <project code>','-l ncpus='+str(cores),'-l mem='+str(memory),
                                '-l storage=gdata/<project code>+gdata/<project code>+gdata/<project code>...'],
                     header_skip=["select"],python=os.environ["DASK_PBS_PYTHON"])
cluster.scale(jobs=2)
client = Client(cluster)
client

At the end of the jobscript

Add and run the cell below to gracefully stop your job:

!pangeo.end.sh