Page tree


CUDA is a parallel computing platform and Application Programming Interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled Graphics Processing Unit (GPU) for general purpose processing – an approach termed GPGPU. The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

More information:

How to use 

 You can check the versions installed in Gadi with a module query:

$ module avail cuda

We normally recommend using the latest version available and always recommend to specify the version number with the module command:

$ module load cuda/11.4.1

For more details on using modules see our software applications guide.

Follow the links below for information about how to create a binary executable of your CUDA-enabled or MPI and CUDA-enabled applications:

An example PBS job submission script named is provided below.

It requests 48 CPUs, 4 GPUs, 350 GiB memory, and 400 GiB local disk on a compute node on Gadi from the gpuvolta queue for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done.

 To change the number of CPU cores, memory, or jobfs required, simply modify the appropriate PBS resource requests at the top of the job script files according to the information available in our queue structure guide.

Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs accordingly is required to prevent the compute resource waste.

#PBS -P a00
#PBS -q gpuvolta
#PBS -l ncpus=48
#PBS -l ngpus=4
#PBS -l mem=350GB
#PBS -l jobfs=400GB
#PBS -l walltime=00:30:00
#PBS -l wd
# Load modules, always specify version number.
module load cuda/11.4.1
module load openmpi/4.1.0
# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`

# Run application
# The following will run 1 MPI process per GPU and there are 4
# GPUs in each GPU node.
mpirun -np $PBS_NGPUS --map-by ppr:1:numa <your CUDA exe>

To run the job you would use the PBS command:

$ qsub

Authors: Mohsin Ali
  • No labels