DeepSpeed, it is a deep learning optimization library developed by Microsoft that provides advanced features such as automatic mixed precision training, gradient checkpointing, and more.

It is designed to improve the training speed and memory efficiency of large-scale deep learning models. If you are interested in using DeepSpeed, you can find more information and documentation on their official website.

Accessing the module

You can access the DeepSpeed module at Gadi by joining the project dk92. Note that no storage or compute resources are provided by project dk92 as it is purely for accessing the software. You will need to use your existing compute NCI project code for computational resources.

The latest module version is "deepspeed/0.15.1".

Using the module

You can start using the module in two scenarios by submitting PBS jobs to the 'gpuvolta' queue.

Single GPU node

You can submit a job as below to run DeepSpeed in a single GPU node.

PBS job script

#!/bin/bash
 
#PBS -q gpuvolta
#PBS -l ncpus=48
#PBS -l ngpus=4
#PBS -l mem=380GB
#PBS -l jobfs=400GB
#PBS -l walltime=00:30:00
#PBS -l storage=gdata/dk92+scratch/ab12
#PBS -l wd
#PBS -n deepspeed_test

# Must include `#PBS -l storage=gdata/dk92+scratch/ab12` if the job
# needs access to `/scratch/a00/` and use Deepspeed module.
# Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained
 
module use /g/data/dk92/apps/Modules/modulefiles
module load deepspeed/0.15.1

deepspeed ${DEEPSPEED_ROOT}/examples/cifar/cifar10_deepspeed.py --deepspeed --deepspeed_config ${DEEPSPEED_ROOT}/examples/cifar/ds_config.json

After requesting the GPU resources in a single node, you can run "deepspeed" command directly on your script as above.

Multiple GPU nodes

You can submit a PBS job to request multiple GPU nodes and run DeepSpeed script in one of two commands, i.e. "deepspeed" or "mpirun".

PBS job script

#!/bin/bash
 
#PBS -q gpuvolta
#PBS -l ncpus=96
#PBS -l ngpus=8
#PBS -l mem=760GB
#PBS -l jobfs=800GB
#PBS -l walltime=00:30:00
#PBS -l storage=gdata/dk92+scratch/ab12
#PBS -l wd
#PBS -n deepspeed_test

# Must include `#PBS -l storage=gdata/dk92+scratch/ab12` if the job
# needs access to `/scratch/a00/` and using Dee-speed module.
# Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained
 
module use /g/data/dk92/apps/Modules/modulefiles
module load deepspeed/0.15.1

pbs_tohostfile

# Recommend to use MPI.
mpirun --hostfile myhostfile --bind-to none -x UCX_TLS=tcp python  ${DEEPSPEED_ROOT}/examples/cifarr10_deepspeed.py --deepspeed --deepspeed_config ${DEEPSPEED_ROOT}/examples/cifar/ds_config.json  

# Or you can run 'deepspeed' command with proper flags
#deepspeed --hostfile myhostfile  --no_ssh_check  ${DEEPSPEED_ROOT}/examples/cifar/cifar10_deepspeed.py --deepspeed --deepspeed_config ${DEEPSPEED_ROOT}/examples/cifar/ds_config.json

To utilise multiple GPU nodes, you need to firstly create a file containing the name of all hosts and their number of slots in a format as below

gadi-gpu-v100-0125.gadi.nci.org.au slots=4
gadi-gpu-v100-0132.gadi.nci.org.au slots=4

The "deepspeed" module has provided a script called 'pbs_tohostfile' to convert the PBS nodefile to the above hostfile. It will create a file named "myhostfile" under your current working directory and then you can use it with either 'deepspeed' or 'mpirun' command to run the DeepSpeed model.

Note

It is recommended to add the following lines into your ~/.ssh/config file to avoid making response when logging into other nodes.

Host gadi-*
    StrictHostKeyChecking no
    UserKnownHostsFile=/dev/null

You don't need to add the above lines if using "mpirun" command.

Page tree

DeepSpeed

Accessing the module

Using the module

Single GPU node

Multiple GPU nodes