Page tree


PAPI stands for Performance Application Programming Interface.

It provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. It enables software engineers to see, in near real time, the relation between software performance and processor events.

In addition, Component PAPI provides access to a collection of components that expose performance measurement opportunities across the hardware and software stack.

More information:

How to use 

You can check the versions installed in Gadi with a module query:

$ module avail papi

We normally recommend using the latest version available and always recommend to specify the version number with the module command:

$ module load papi/5.7.0

For more details on using modules see our software applications guide.

Instrumentation of Program

PAPI requires user instrumentation of the program; to this end include files and PAPI function calls must be inserted in the subroutines which are to be measured.

  • For C, please include the file papi.h

  • For Fortran 77, please include the file f77papi.h

  • For Fortran 90, please include the file f90papi.h

  • If you intend to preprocess your Fortran code, you may use the file fpapi.h

Then build and link the objects using PAPI:

# Load modules, always specify version number.
$ module load openmpi/4.0.2
$ module load papi/5.7.0
$ mpicc <Compiler Options> $CPATH -o mpi_program.o mpi_program.c
$ mpicc <Linkage Options> -o mpi_program mpi_program.o $LD_LIBRARY_PATH

An example PBS job submission script named is provided below. It requests 48 CPUss, 128 GiB memory, 400 GiB local disk on a compute node on Gadi from the normal queue for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done.

To change the number of CPU cores, memory, or jobfs required, simply modify the appropriate PBS resource requests at the top of the job script files according to the information available in our queue structure guide.

Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs accordingly is required to prevent the compute resource waste.

#PBS -P a00
#PBS -q normal
#PBS -l ncpus=48
#PBS -l mem=128GB
#PBS -l jobfs=400GB
#PBS -l walltime=00:30:00
#PBS -l wd
# Load modules, always specify version number.
module load openmpi/4.0.2
module load papi/5.7.0
# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`
# Run application
mpirun -np $PBS_NCPUS ./mpi_program

To run the job you would use the PBS command:

$ qsub

The instrumentation results, if any, will be printed in the stdout, i.e., in the job output file<jobID> when the job completes.

PAPI Utilities

Load PAPI module:

# Load module, always specify version number.
$ module load papi/5.7.0

Then list the contents of $PAPI_BASE/bin directory and consult the corresponding man pages or see page for details.

PAPI documentation

PAPI Wiki:

How to Use PAPI:

PAPI User Guide:

Authors: Mohsin Ali
  • No labels