The jupyter.ini.sh script has several flags to adjust the number of workers and threads per worker, which supports both CPU and GPU nodes.
Multiple CPU nodes
You can use the "-D" flag to specify a pre-defined Dask cluster that utilises the requested PBS resources, treating each CPU core as a single-threaded Dask worker.
Alternatively, you have the option to customise the number of Dask workers per node and the number of threads per Dask worker by using the "-p" and "-t" options.
For instance, when submitting a PBS job requesting 96 cores from the normal queue (equivalent to 2 worker nodes), there are various ways to configure the Dask cluster.
$ jupyter.ini.sh -D # set up a Dask cluster with 48 Dask workers per node, |
By specifying the number of Dask workers and threads, users can adjust the memory capacity and parallelization for each Dask worker, thereby addressing potential stability and performance issues within the Dask cluster. It will help to address the potential stability and performance issues of Dask cluster.
Multiple GPU nodes
When running jupyter.ini.sh
to set up a Dask cluster using Gadi GPU devices, you can include the "-g" flag along with "-D". By default, the number of Dask workers is set to match the number of GPU devices requested in the PBS job, with each worker allocated 1 thread.
$ jupyter.ini.sh -D -g # set up a Dask cluster utilising GPU devices. |
Note: You can also append "-J" flag in above commands to set up a JupyterLab session.
Connect to the Dask cluster
After setting up the Dask cluster via the jupyter.ini.sh script, you can connect to it in you jupyter notebook or python script as below
from dask.distributed import Client import os client = Client(scheduler_file=os.environ["DASK_PBS_SCHEDULER"]) print(client)
The output will show the configuration of the client and Dask cluster. You can check that the number of cores matches what you requested in the job script.
PBS Job script examples
In a PBS job, you can utilize the 'gadi_jupyterlab' module to establish a pre-defined Dask cluster that utilizes the requested compute resources. Subsequently, you can execute a Python script to connect to the predefined Dask cluster and perform your desired tasks. The 'gadi_jupyterlab' module can be employed in conjunction with an existing module or a Python virtual environment.
Working with a Python virtual environment
Here is an example of a PBS job script that utilizes the gadi_jupyterlab
module within a user's Python virtual environment located at "/scratch/ab123/abc777/venv/dask_test/bin/activate". This script requests 2 nodes in the normal queue. The test.py
script is provided in the section above. Note the job script doesn't start a Jupyterlab session as it is used for batch computations.
#!/bin/bash |
Working with an existing module
Here is a PBS job script that utilizes the 'gadi_jupyterlab' module alongside an existing module called 'NCI-data-analysis/2023.02'. The script requests 2 nodes in the normal queue. The 'test.py' script is provided in the section above. Note the job script doesn't start a Jupyterlab session as it is used for batch computations.
#!/bin/bash |
In the output of the above job script, you should see a similar line as below:
<Client: 'tcp://10.6.48.12:8753' processes=96 threads=96, memory=384.00 GiB>
Now you can add your own script into "test.py" after connecting to the pre-defined Dask cluster.