After starting a PBS job, we can invoke jupyter.ini.sh script to set up a pre-defined Dask cluster to utilise the compute resources requested by the job.
Multiple CPU nodes
You can specify the flag "-D" to set up a pre-defined Dask cluster with the requested PBS resources i.e. using each CPU core as a single threaded Dask worker.
Alternatively, you can adjust the number of Dask workers per node and threads per Dask worker by specifying the "-p" and "-t" options.
For example, in a PBS job requesting 96 cores of the normal queue (i.e. 2 worker nodes), you could set up the Dask cluster in several ways
$ jupyter.ini.sh -D # set up a Dask cluster with 48 Dask workers per node, |
Specifying the number of Dask workers and threads enables the user to adjust the memory capability and parallelisation per Dask worker. It will help to address the potential stability and performance issues of Dask cluster.
Multiple GPU nodes
You can also specify flag "-G" together with "-D" when running jupyter.ini.sh to set up a Dask cluster by using Gadi GPU devices. As default, the number of Dask workers equals the number of GPU devices requested in the PBS job and each worker has 1 thread.
$ jupyter.ini.sh -D -g # set up a Dask cluster utilising GPU devices. |
Note: You can also append "-J" flag in above commands to set up a JupyterLab session.
Connect to the Dask cluster
After setting up the Dask cluster via the jupyter.ini.sh script, you can connect to it in you jupyter notebook or python script as below
|
The output will show the configuration of the client and Dask cluster. You can check that the number of cores matches what you requested in the job script.