Page tree

The  jupyter.ini.sh script has several flags to adjust the number of workers and threads per worker, which supports both CPU and GPU nodes.

Multiple CPU nodes

You can use the "-D" flag to specify a pre-defined Dask cluster that utilises the requested PBS resources, treating each CPU core as a single-threaded Dask worker. 

Alternatively, you have the option to customise the number of Dask workers per node and the number of threads per Dask worker by using the "-p" and "-t" options.

For instance, when submitting a PBS job requesting 96 cores from the normal queue (equivalent to 2 worker nodes), there are various ways to configure the Dask cluster.

$ jupyter.ini.sh -D            # set up a Dask cluster with 48 Dask workers per node, 
96 total Dask workers, 1 thread per Dask worker, 96 total Dask threads.
$ jupyter.ini.sh -D -p 12 # set up a Dask cluster with 12 Dask workers per node,
24 total Dask workers, 4 threads per Dask worker, 96 total Dask threads.
$ jupyter.ini.sh -D -p 12 -t 2 # set up a Dask cluster with 12 Dask workers per node,
24 total Dask workers, 2 threads per Dask worker, 48 total Dask threads.

By specifying the number of Dask workers and threads, users can adjust the memory capacity and parallelization for each Dask worker, thereby addressing potential stability and performance issues within the Dask cluster. It will help to address the potential stability and performance issues of Dask cluster.

Multiple GPU nodes

When running jupyter.ini.sh to set up a Dask cluster using Gadi GPU devices, you can include the "-g" flag along with "-D". By default, the number of Dask workers is set to match the number of GPU devices requested in the PBS job, with each worker allocated 1 thread.

$ jupyter.ini.sh -D -g         # set up a Dask cluster utilising GPU devices. 
# The number of Dask workers equals to the number GPU devices requested in the PBS job.
# Each worker has 1 thread.

Note: You can also append "-J" flag in above commands to set up a JupyterLab session.

Connect to the Dask cluster

After setting up the Dask cluster via the jupyter.ini.sh script, you can connect to it in you jupyter notebook or python script as below

test.py
from dask.distributed import Client
import os
client = Client(scheduler_file=os.environ["DASK_PBS_SCHEDULER"])
print(client)

The output will show the configuration of the client and Dask cluster. You can check that the number of cores matches what you requested in the job script. 

PBS Job script examples

In a PBS job, you can utilize the 'gadi_jupyterlab' module to establish a pre-defined Dask cluster that utilizes the requested compute resources. Subsequently, you can execute a Python script to connect to the predefined Dask cluster and perform your desired tasks. The 'gadi_jupyterlab' module can be employed in conjunction with an existing module or a Python virtual environment.

Working with a Python virtual environment

Here is an example of a PBS job script that utilizes the gadi_jupyterlab module within a user's Python virtual environment located at "/scratch/ab123/abc777/venv/dask_test/bin/activate". This script requests 2 nodes in the normal queue. The test.py script is provided in the section above. Note the job script doesn't start a Jupyterlab session as it is used for batch computations.

#!/bin/bash
#PBS -N dask_test
#PBS -l ncpus=96
#PBS -l mem=384GB
#PBS -l jobfs=200GB
#PBS -q normal
#PBS -P ab123
#PBS -l walltime=00:30:00
#PBS -l storage=gdata/dk92+scratch/ab123
#PBS -l wd

module use /g/data/dk92/apps/Modules/module-files
module load gadi_jupyterlab/23.02
module load python3/3.10.4
source /scratch/ab123/abc777/venv/dask_test/bin/activate
jupyter.ini.sh -D
python /scratch/ab123/abc777/test_dir/test.py

Working with an existing module

Here is a PBS job script that utilizes the 'gadi_jupyterlab' module alongside an existing module called 'NCI-data-analysis/2023.02'. The script requests 2 nodes in the normal queue. The 'test.py' script is provided in the section above. Note the job script doesn't start a Jupyterlab session as it is used for batch computations.

#!/bin/bash
#PBS -N dask_test
#PBS -l ncpus=96
#PBS -l mem=384GB
#PBS -l jobfs=200GB
#PBS -q normal
#PBS -P ab123
#PBS -l walltime=00:30:00
#PBS -l storage=gdata/dk92+scratch/ab123
#PBS -l wd

module use /g/data/dk92/apps/Modules/module-files
module load gadi_jupyterlab/23.02
module load NCI-data-analysis/2023.02
jupyter.ini.sh -D
python /scratch/ab123/abc777/test_dir/test.py

In the output of the above job script, you should see a similar line as below:

<Client: 'tcp://10.6.48.12:8753' processes=96 threads=96, memory=384.00 GiB>

Now you can add your own script into "test.py" after connecting to the pre-defined Dask cluster.



  • No labels