NCI-ai-ml environment
Running Horovod
Running Ray and associated Libraries
JupyterLab Sessions for Deep Learning Jobs
We have put several examples under the NCI-ai-ml module space for users to validate the environment. After loading the NCI-ai-ml module, you can find them under '${NCI_AI_ML_ROOT}/examples".
We also provide 3 PBS job scripts in the directory "${NCI_AI_ML_ROOT}/examples" to run these examples:
- horovod_gloo.pbs runs examples via horovod+gloo across 2 GPU nodes.
- horovod_mpi.pbs runs examples via horovod+mpi across 2 GPU nodes.
- raytrain.pbs runs examples via ray Train across 2 GPU nodes.
You can copy them to your working directory and then submit them to the Gadi PBS queue system. Please keep "gdata/dk92+gdata/wb00" in the "#PBS -l storage" line and replace others with your own project storage spaces.