Environments 

You will need to load both the NCI-ai-ml and gadi_jupyterlab modules as below

module use /g/data/dk92/apps/Modules/modulefiles
module load NCI-ai-ml/22.08 gadi_jupyterlab/22.06

Preparing the Dataset

Please note the Gadi GPU nodes can not connect to the internet so you can't automatically download datasets in a PBS job. As an alternative, you can download your input dataset via the Gadi login node and specify the data location in your job script.

For example, you can download the MNIST dataset on the Gadi login node via the following script

from torchvision import datasets
data_dir="./data"
datasets.MNIST(data_dir,download=True)

A copy of  the MNIST dataset has also been placed under the project wb00, i.e. "/g/data/wb00/MNIST".

NCI also provides access to some other AI/ML datasets such as ImageNet at Gadi. Please join the project wb00 if you would like to access them.  

Benchmark and Examples

Some examples are taken from the Ray repository. You can clone them on the Gadi login node from the reference link of each example case.

You can also find the revised examples (by directing the data directory to Gadi local file system) under the current NCI-ai-ml module space, i.e. "${NCI_GPU_ML_ROOT}/examples". The exact path is given in each example case as below.

You can monitor the runtime GPU utilisations via the gpustat tool.

For details on using Ray with NCI-ai-ml module, please see here.













  • No labels