After starting a PBS job, we can invoke jupyter.ini.sh script to set up a pre-defined Dask cluster to utilise the compute resources requested by the job.

Multiple CPU nodes

You can specify the flag "-D" to set up a pre-defined Dask cluster with the requested PBS resources i.e. using each CPU core as a single threaded Dask worker.

Alternatively, you can adjust the number of Dask workers per node and threads per Dask worker by specifying the "-p" and "-t" options.

For example, in a PBS job requesting 96 cores of the normal queue (i.e. 2 worker nodes), you could set up the Dask cluster in several ways

$ jupyter.ini.sh -D            # set up a Dask cluster with 48 Dask workers per node, 
                                 96 total Dask workers, 1 thread per Dask worker, 96 total Dask threads.
$ jupyter.ini.sh -D -p 12      # set up a Dask cluster with 12 Dask workers per node, 
                                 24 total Dask workers, 4 threads per Dask worker, 96 total Dask threads.
$ jupyter.ini.sh -D -p 12 -t 2 # set up a Dask cluster with 12 Dask workers per node, 
                                 24 total Dask workers, 2 threads per Dask worker, 48 total Dask threads.

Specifying the number of Dask workers and threads enables the user to adjust the memory capability and parallelisation per Dask worker. It will help to address the potential stability and performance issues of Dask cluster.

Multiple GPU nodes

You can also specify flag "-G" together with "-D" when running jupyter.ini.sh to set up a Dask cluster by using Gadi GPU devices. As default, the number of Dask workers equals the number of GPU devices requested in the PBS job and each worker has 1 thread.

$ jupyter.ini.sh -D -g         # set up a Dask cluster utilising GPU devices. 
                               # The number of Dask workers equals to the number GPU devices requested in the PBS job.
                               # Each worker has 1 thread.

Note: You can also append "-J" flag in above commands to set up a JupyterLab session.

Connect to the Dask cluster

After setting up the Dask cluster via the jupyter.ini.sh script, you can connect to it in you jupyter notebook or python script as below

from dask.distributed import Clientimport os client = Client(scheduler_file=os.environ["DASK_PBS_SCHEDULER"])
print(client)

The output will show the configuration of the client and Dask cluster. You can check that the number of cores matches what you requested in the job script.

Page tree

Set up a Dask Cluster

Multiple CPU nodes

Multiple GPU nodes

Connect to the Dask cluster