Page tree

On this page

Overview

CUDA is a parallel computing platform and Application Programming Interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled Graphics Processing Unit (GPU) for general purpose processing – an approach termed GPGPU. The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

More information: https://docs.nvidia.com/cuda/index.html

Usage

You can check the versions installed in Gadi with a module query:

$ module avail cuda

We normally recommend using the latest version available and always recommend to specify the version number with the module command:

$ module load cuda/11.4.1

For more details on using modules see our modules help guide at https://opus.nci.org.au/display/Help/Environment+Modules.

Follow the links below for information about how to create a binary executable of your CUDA-enabled or MPI and CUDA-enabled applications:

An example PBS job submission script named cuda_job.sh is provided below. It requests 48 CPU cores, 4 GPUs, 350 GiB memory, and 400 GiB local disk on a compute node on Gadi from the gpuvolta queue for its exclusive access for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done. To change the number of CPU cores, memory, or jobfs required, simply modify the appropriate PBS resource requests at the top of the job scrip files according to the information available at https://opus.nci.org.au/display/Help/Queue+Structure. Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs accordingly is required to prevent the compute resource waste.

#!/bin/bash

#PBS -P a00
#PBS -q gpuvolta
#PBS -l ncpus=48
#PBS -l ngpus=4
#PBS -l mem=350GB
#PBS -l jobfs=400GB
#PBS -l walltime=00:30:00
#PBS -l wd

# Load modules, always specify version number.
module load cuda/11.4.1
module load openmpi/4.1.0

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on:
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained

# Run application
# The following will run 1 MPI process per GPU and there are 4
# GPUs in each GPU node.
mpirun -np $PBS_NGPUS --map-by ppr:1:numa <your CUDA exe>

To run the job you would use the PBS command:

$ qsub cuda_job.sh