HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the large scale supercomputers.
It provides accurate measurements of a program’s work, resource consumption, and inefficiency, correlates these metrics with the program’s source code, works with multilingual, fully optimised binaries, has very low measurement overhead, and scales to large parallel systems.
HPCToolkit's measurements provide support for analysing a program execution cost, inefficiency, and scaling characteristics both within and across nodes of a parallel system.
More information: http://hpctoolkit.org/
You can check the versions installed in Gadi with a module
query:
$ module avail hpctoolkit $ module avail hpcviewer # For visualisation
We normally recommend using the latest version available and always recommend to specify the version number with the module
command:
$ module load hpctoolkit/2021.05.15 $ module load hpcviewer/2021.05.15 # For visualisation
For more details on using modules see our software applications guide.
Measurement of application performance takes two different forms depending on whether your application is dynamically or statically linked. To monitor a dynamically linked application, simply use hpcrun
to launch the application. To monitor a statically linked application, link your application using hpclink.
To monitor a sequential or multithreaded application, use:
$ hpcrun [options] prog.exe [arguments]
To monitor an MPI application, use:
$ mpirun hpcrun [options] prog.exe [arguments]
To link hpcrun
's monitoring code into prog.exe
, use:
$ hpclink <linker> -o prog.exe <linker-arguments>
If no options is specified to hpcrun
, walltime will be measured for prog.exe
. Otherwise, please specify PAPI events to be measured for prog.exe
. A available list of PAPI events can be retrieved by running following command:
$ hpcrun -L prog.exe
A sample PBS job script for using hpcrun
with measurements passed through environment variables is like following:
An example PBS job submission script named hpctoolkit_job.sh
is provided below. It requests 48 CPUs, 128 GiB memory, and 400 GiB local disk on a compute node on Gadi from the normal
queue with exclusive access for 30 minutes against the project a00
. It also requests the system to enter the working directory once the job is started.
This script should be saved in the working directory from which the analysis will be done. To change the number of CPU cores, memory, or jobfs required, simply modify the appropriate PBS resource requests at the top of the job script files according to the information available in our queue structure guide.
Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs
accordingly is required to prevent the compute resource waste.
#!/bin/bash #PBS -P a00 #PBS -q normal #PBS -l ncpus=48 #PBS -l mem=128GB #PBS -l jobfs=400GB #PBS -l walltime=00:30:00 #PBS -l wd # Load modules, always specify version number. module load openmpi/4.1.1 module load hpctoolkit/2021.05.15 # Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job # needs access to `/scratch/ab12/` and `/g/data/yz98/` # Set measurements export HPCRUN_EVENT_LIST="WALLCLOCK@5000" # Run application mpirun -np $PBS_NCPUS hpcrun prog.exe
The above PBS job script when measurements passed as an option with hpcrun
is like the following:
#!/bin/bash #PBS -P a00 #PBS -q normal #PBS -l ncpus=48 #PBS -l mem=128GB #PBS -l jobfs=400GB #PBS -l walltime=00:30:00 #PBS -l wd # Load modules, always specify version number. module load openmpi/4.1.1 module load hpctoolkit/2021.05.15 # Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job # needs access to `/scratch/ab12/` and `/g/data/yz98/` # Run application mpirun -np $PBS_NCPUS hpcrun -e WALLCLOCK@5000 prog.exe
To run the job you would use the PBS command:
$ qsub hpctoolkit_job.sh
In the above example, 5000 is a sample rate for each individual measurement. With larger number of the sample rate, the sample frequency is lower, and associate overhead of HPCToolkit is lower. In general, the overhead of HPCToolKit is around 1% to 3%.
Some other useful measurements include:
WALLCLOCK
: Walltime spent on each functions, or outstanding instructions. PAPI_FP_INS
: Floating point instructions (x87) PAPI_VEC_SP
: Single precision vector/SIMD instructions PAPI_VEC_DP
: Double precision vector/SIMD instructions PAPI_LD_INS
: Load instructions PAPI_SR_INS
: Store instructions PAPI_BR_INS
: Branch instructions and more…, please refer to hpcrun -L prog.exe
for a complete list of measurable events, or the PAPI Preset Events list.
The available measurement events are different between different systems. Please make sure the event is available and measurable using hpcrun -L prog.exe
.
To measure multiple events at once, following format of event options or environment variable can be used:
-e WALLCLOCK@5000 -e PAPI_LD_INS@4000001 -e PAPI_SR_INS@4000001
export HPCRUN_EVENT_LIST="WALLCLOCK@5000;PAPI_LD_INS@4000001;PAPI_SR_INS@4000001"
hpcrun
will generate a directory named hpctoolkit-<prog.exe>-measurements-<jobid>
in your job's directory.
Please follow the following sequence to parse the raw measurements in hpctoolkit-<prog.exe>-measurements-<jobid>
.
$ hpcstruct prog.exe
This will generate a prog.exe.hpcstruct
file which contains the code structure of prog.exe
.
For serial program:
$ hpcprof -S prog.exe.hpcstruct -I <source code directory>/'*' hpctoolkit-<prog.exe>-measurements-<jobid>
For parallel (MPI/OpenMP) program:
$ hpcprof --force-metric --metric=<metrics option> -S prog.exe.hpcstruct -I <source code directory>/'*' hpctoolkit-<prog.exe>-measurements-<jobid>
Options for --metric
(or -M)
includes:
Please refer hpcprof --help
for more details.
A graphical presentable database will be generated after hpcprof
executed. It is a directory with name like:
$ hpctoolkit-<prog.exe>-database-<jobid>
To visualise the HPCToolKIt profile data on Gadi, login to Gadi with X11 (X-Windows) forwarding. Add the -Y
option for Linux/Mac/Unix to your SSH command to request SSH to forward the X11 connection to your local computer. For Windows, we recommend to use MobaXterm (http://mobaxterm.mobatek.net) as it automatically uses X11 forwarding.
For more information on MobaXterm and X-forwarding, please see our connecting to Gadi page.
# Load module, always specify version number. $ module load hpcviewer/2021.05.15 $ hpcviewer hpctoolkit-<prog.exe>-database-<jobid>
Two different metric is presented: inclusive and exclusive, denoted by "I" and "E" respectively in the metric panel of hpcviewer
.