Page tree

After starting a PBS job, we can invoke script to set up a pre-defined Ray cluster to utilise the compute resources requested by the job. 

Multiple CPU nodes

You can specify the option "-R" to set up a Ray cluster with the requested PBS resources i.e. using each CPU core as a single threaded Ray worker. In this case, you can specify the number of Ray workers per node by specifying the "-p" options. 

For example, in a PBS job requesting 96 cores of the normal queue (i.e. 2 worker nodes), you could set up a Ray cluster with different configurations

$ -R            # set up a Ray cluster with 48 Ray workers per node, 
96 total Ray workers, 1 thread per Ray worker.
$ -R -p 12 # set up a Ray cluster with 12 Ray workers per node,
24 total Ray workers, 1 threads per Ray worker.

Note a Ray worker always contains 1 thread so "-t" is invalid in setting up the Ray cluster.

Multiple GPU nodes

You can also specify flag "-g" together with "-R" when running to set up a Rask cluster by using Gadi GPU devices. As default, the number of Dask workers equals the number of GPU devices requested in the PBS job and each worker has 1 thread. 

$ -R -g         # set up a Ray cluster utilising GPU devices. 
# The Ray cluster contains both CPU and GPU resources allocated within the PBS job.

Note: You can also append "-J" flag in above commands to set up a JupyterLab session.

Connect to the Ray cluster

After it is running, you can connect to this Ray cluster in a python script as below:

import ray 

  • No labels