Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Single Node

If you requested are working in a single node resources via OOD, ARE or Gadi PBS job, you can simply call ray.init() within your Jupyter Notebook or python script to start a Ray runtime on the current working host.

You can view the resources available to the Ray via function ray.cluster_resources().

Code Block
languagepy
titleSingle node ray
import ray
ray.init()
print(ray.cluster_resources())

The above commands may will print some messages a message as below indicating which indicates there are 16 CPU core, 29GB memory and 1 node available for the current local Ray cluster.

...

If you need a larger scale of Ray cluster crossing across multiple nodes, you can start a pre-defined Ray cluster and then connect it in the Jupyter notebook or python script.

The easiest An easy way to set up the pre-defined Ray cluster is to utilize utilise dk92 module "gadi_jupyterlab".  

...

In your PBS job script, you should load the gadi_jupyterlab module together with NCIload NCI-data-analysis/2022.06 together with the gadi_jupyterlab module. Then you need to run a script called "jupyter.ini.sh -R" to set up the predefined pre-defined Ray cluster. It will start a Ray worker on each CPU core of all available compute nodes in a job. You can also specify the number of Ray workers by following per node via "-p" flag. For example, in a job requesting 96 cores ( 2 nodes) of "normal" queue, you can set up a pre-defined Ray cluster with 12 Ray workers per node, and 24 Ray workers in total via the following command:

jupyter.ini.sh -R -p 12

An example of a full job script requesting 96 Ray workers is given below

#!/bin/bash
#PBS -P fp0
#PBS -q normal
#PBS -lwd,walltime=10:00:00,ncpus=96,mem=192GB,jobfs=400GB,storage=gdata/dk92+gdata/z00+scratch/fp0+gdata/fp0

module purge
module use /g/data/dk92/apps/Modules/modulefiles
module load NCI-data-analysis/222022.06 gadi_jupyterlab/22.0506

jupyter.ini.sh -R # set up a Ray cluster with 48 Ray workers per node,96 total Ray workers, 1 thread per Ray worker.

# jupyter.ini.sh -R -p 12 # or set up a Ray cluster with 12 Ray workers per node, 24 total Ray workers, 1 threads per Ray worker.

python script.py

In the "script.py", you can need to connect to the predefined pre-defined Ray cluster by calling ray.init() and specify the address flag as "auto".

...

The above script will print the following messagesmessage

{'object_store_memory': 114296048025.0, 'CPU': 96.0, 'memory': 256690778727.0, 'node:10.6.48.66': 1.0, 'node:10.6.48.67': 1.0}

...

First of all, you need to request multiple nodes in ARE JupyterLab session interface and specify the proper storage projects.

Then click "advanced Advanced options" button, and put "/g/data/dk92/apps/Modules/modulefiles" in "Module directories" and load both NCI-data-analysis/222022.06 and gadi-jupyterlab/202222.06 modules in "Module" field. In the "Pre-script" field, fill in the command "jupyterlab.ini.sh -R" to set up the predefined pre-defined Ray cluster.  

Click "Open JupyterLab" button to open the JupyterLab session as soon it is highlighted.

...

In the Jupyter notebook, using the following lines to connect the predefined pre-defined Ray cluster and print the available resources information. 

You can will see 96 CPU Cores and two nodes are used by the cluster as expected..

Monitoring Ray status

You can easily monitor Ray status via the command "ray status".  Open a CLI terminal in either JupyterLab session or a Gadi PBS interactive job and type in the following command

...