DeepSpeed, it is a deep learning optimization library developed by Microsoft that provides advanced features such as automatic mixed precision training, gradient checkpointing, and more.
It is designed to improve the training speed and memory efficiency of large-scale deep learning models. If you are interested in using DeepSpeed, you can find more information and documentation on their official website.
Accessing the module
You can access the DeepSpeed module at Gadi by joining the project dk92. Note that no storage or compute resources are provided by project dk92 as it is purely for accessing the software. You will need to use your existing compute NCI project code for computational resources.
The latest module version is "deepspeed/0.15.1".
Using the module
You can start using the module in two scenarios by submitting PBS jobs to the 'gpuvolta' queue.
Single GPU node
You can submit a job as below to run DeepSpeed in a single GPU node.
#!/bin/bash #PBS -q gpuvolta #PBS -l ncpus=48 #PBS -l ngpus=4 #PBS -l mem=380GB #PBS -l jobfs=400GB #PBS -l walltime=00:30:00 #PBS -l storage=gdata/dk92+scratch/ab12 #PBS -l wd #PBS -n deepspeed_test # Must include `#PBS -l storage=gdata/dk92+scratch/ab12` if the job # needs access to `/scratch/a00/` and use Deepspeed module. # Details on: # https://opus.nci.org.au/display/Help/PBS+Directives+Explained module use /g/data/dk92/apps/Modules/modulefiles module load deepspeed/0.15.1 deepspeed ${DEEPSPEED_ROOT}/examples/cifar/cifar10_deepspeed.py --deepspeed --deepspeed_config ${DEEPSPEED_ROOT}/examples/cifar/ds_config.json
After requesting the GPU resources in a single node, you can run "deepspeed" command directly on your script as above.
Multiple GPU nodes
You can submit a PBS job to request multiple GPU nodes and run DeepSpeed script in one of two commands, i.e. "deepspeed" or "mpirun".
#!/bin/bash #PBS -q gpuvolta #PBS -l ncpus=96 #PBS -l ngpus=8 #PBS -l mem=760GB #PBS -l jobfs=800GB #PBS -l walltime=00:30:00 #PBS -l storage=gdata/dk92+scratch/ab12 #PBS -l wd #PBS -n deepspeed_test # Must include `#PBS -l storage=gdata/dk92+scratch/ab12` if the job # needs access to `/scratch/a00/` and using Dee-speed module. # Details on: # https://opus.nci.org.au/display/Help/PBS+Directives+Explained module use /g/data/dk92/apps/Modules/modulefiles module load deepspeed/0.15.1 pbs_tohostfile # Recommend to use MPI. mpirun --hostfile myhostfile --bind-to none -x UCX_TLS=tcp python ${DEEPSPEED_ROOT}/examples/cifarr10_deepspeed.py --deepspeed --deepspeed_config ${DEEPSPEED_ROOT}/examples/cifar/ds_config.json # Or you can run 'deepspeed' command with proper flags #deepspeed --hostfile myhostfile --no_ssh_check ${DEEPSPEED_ROOT}/examples/cifar/cifar10_deepspeed.py --deepspeed --deepspeed_config ${DEEPSPEED_ROOT}/examples/cifar/ds_config.json
To utilise multiple GPU nodes, you need to firstly create a file containing the name of all hosts and their number of slots in a format as below
gadi-gpu-v100-0125.gadi.nci.org.au slots=4
gadi-gpu-v100-0132.gadi.nci.org.au slots=4
The "deepspeed" module has provided a script called 'pbs_tohostfile' to convert the PBS nodefile to the above hostfile. It will create a file named "myhostfile" under your current working directory and then you can use it with either 'deepspeed' or 'mpirun' command to run the DeepSpeed model.
Note
It is recommended to add the following lines into your ~/.ssh/config file to avoid making response when logging into other nodes.
Host gadi-*
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
You don't need to add the above lines if using "mpirun" command.