Launching a predefined Ray cluster

Gadi

In your job script, you should load the gadi_jupyterlab module together with NCI-data-analysis/22.06. Then execute the jupyter.ini.sh to set up the predefined Ray cluster.

#!/bin/bash
#PBS -P fp0
#PBS -q normal
#PBS -lwd,walltime=10:00:00,ncpus=96,mem=192GB,jobfs=400GB,storage=gdata/dk92+gdata/z00+scratch/fp0+gdata/fp0

module purge
module use /g/data/dk92/apps/Modules/modulefiles
module load NCI-data-analysis/22.06 gadi_jupyterlab/22.05

jupyter.ini.sh -R   # set up a Ray cluster with 48 Ray workers per node,96 total Ray workers, 1 thread per Ray worker.

# jupyter.ini.sh -R -p 12 # or set up a Ray cluster with 12 Ray workers per node, 24 total Ray workers, 1 threads per Ray worker.

python script.py

In the "script.py"After that, you can connect to the existing predefined Ray cluster by calling ray.init() and specify the address flag as "auto".

Code Block

language	py
title	connect to a predefined Ray cluster

import ray
ray.init(address="auto")
print(ray.cluster_resources())

...

{'object_store_memory': 114296048025.0, 'CPU': 96.0, 'memory': 256690778727.0, 'node:10.6.48.66': 1.0, 'node:10.6.48.67': 1.0}

ARE

First of all, you need to request multiple nodes in ARE JupyterLab session interface and specify the storage projects.

Image Added

Then click "advanced options", put "/g/data/dk92/apps/Modules/modulefiles" in "Module directories" and load both NCI-data-analysis/22.06 and gadi-jupyterlab/22.06. In the Pre-script field, fill in the command "jupyterlab -R".

Image Added

Wait until the JupyterLab session starts. Click "Open JupyterLab" button to open the JupyterLab session.

Image Added

In the Jupyter notebook, run the following lines to connect to the predefined Ray cluster and print the resources in using.

You should see 96 CPU Cores and two nodes are in use.

Image Added

Monitoring Ray status

You can easily monitor Ray status via the command "ray status". Open a CLI Terminal in the either JupyterLab session or Gadi PBS job and type in

 $ watch ray status

Then the The Ray status will be kept updating every 2 seconds:

...

Page tree

Versions Compared

Old Version 8

New Version 9

Key

Launching a predefined Ray cluster

Gadi

ARE

Monitoring Ray status

Page tree

Page History

Versions Compared

Old Version 8

New Version 9

Key

Launching a predefined Ray cluster

Gadi

ARE

Monitoring Ray status