Page tree
Skip to end of metadata
Go to start of metadata

HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the large scale supercomputers. HPCToolkit provides accurate measurements of a program’s work, resource consumption, and inefficiency, correlates these metrics with the program’s source code, works with multilingual, fully optimized binaries, has very low measurement overhead, and scales to large parallel systems. HPCToolkit’s measurements provide support for analyzing a program execution cost, inefficiency, and scaling characteristics both within and across nodes of a parallel system.   


Load HPCToolkit module

module load hpctoolkit

Collect Profile Measurements

Measurement of application performance takes two different forms depending on whether your application is dynamically or statically linked. To monitor a dynamically linked application, simply use hpcrun to launch the application. To monitor a statically linked application, link your application using hpclink.   

  • Dynamically linked binaries 
    • To monitor a sequential or multithreaded application, use:   

      hpcrun [options] prog.exe [arguments] 
    • To monitor an MPI application, use:   

      mpirun hpcrun [options] prog.exe [arguments] 
  • Statically linked binaries
    • To link hpcrun’s monitoring code into prog.exe, use:   

      hpclink <linker> -o prog.exe <linker-arguments>

If no options is specified to hpcrun, walltime will be measured for prog.exe. Otherwise, please specify PAPI events to be measured for prog.exe. A available list of PAPI events can be retrieved by running following command:   

hpcrun -L prog.exe

A sample PBS job script for using hpcrun with measurements passed through environment variables is like following:   

#PBS -q normal
#PBS -l ncpus=32
#PBS -l walltime=1:00:00
#PBS -l mem=16GB
#PBS -l wd

module load openmpi/1.6.5
module load hpctoolkit

mpirun -np 32 hpcrun prog.exe

A sample PBS job script for using hpcrun with measurements passed as option is like following:      

#PBS -q normal
#PBS -l ncpus=32
#PBS -l walltime=1:00:00
#PBS -l mem=16GB
#PBS -l wd

module load openmpi/1.6.5
module load hpctoolkit

mpirun -np 32 hpcrun -e WALLCLOCK@5000 prog.exe

Sampling Frequency and Measurements

In the above example, 5000 is a sample rate for each individual measurement. With larger number of the sample rate, the sample frequency is lower, and associate overhead of HPCToolkit is lower. In general, the overhead of HPCToolKit is around 1% to 3%.   

Some other useful measurements include:   

  • WALLCLOCK: Walltime spent on each functions, or outstanding instructions.  
  • PAPI_FP_INS: Floating point instructions (x87)  
  • PAPI_VEC_SP: Single precision vector/SIMD instructions  
  • PAPI_VEC_DP: Double precision vector/SIMD instructions  
  • PAPI_LD_INS: Load instructions  
  • PAPI_SR_INS: Store instructions  
  • PAPI_BR_INS: Branch instructions  
  • and more…, please refer to hpcrun -L prog.exe for a complete list of measurable events, or the PAPI Preset Events list  

Note: the available measurement events are different between different systems. Please make sure the event is available and measurable using hpcrun -L prog.exe.  

To measure multiple events at once, following format of event options or environment variable can be used:  

  • -e WALLCLOCK@5000 -e PAPI_LD_INS@4000001 -e PAPI_SR_INS@4000001 

  • export HPCRUN_EVENT_LIST="WALLCLOCK@5000;PAPI_LD_INS@4000001;PAPI_SR_INS@4000001"   

Profile Data Parse

hpcrun will generate a directory named as follow in your jobs directory


Please follow the following sequence to parse the raw measurements in hpctoolkit-<prog.exe>-measurements-<jobid>  

Recovering Program Structure

hpcstruct prog.exe

This will generate a prog.exe.hpcstruct file which contains the code structure of prog.exe  

Parse the Raw Measurements

For serial program:    

hpcprof -S prog.exe.hpcstruct -I <source code directory>/'*' hpctoolkit-<prog.exe>-measurements-<jobid>

For parallel (MPI/OpenMP) program use either:   

hpcprof --force-metric --metric=<metrics option> -S prog.exe.hpcstruct -I <source code directory>/'*' hpctoolkit-<prog.exe>-measurements-<jobid>

Options for --metric (or -M) includes:  

  • sum: show (only) sum over threads/processes metrics (default)
  • stats: show (only) sum, mean, standard dev, coef of var, min, and max over threads/processes metrics
  • thread: show only thread metrics

Please refer hpcprof --help for more details.   


hpcprof-mpi -S prog.exe.hpcstruct -I <source code directory>/'*' hpctoolkit-<prog.exe>-measurements-<jobid>
Note that hpcprof-mpi does not compute 'thread'.

A graphical presentable database will be generated after hpcprof{-mpi} executed. It is a directory with name like:    


Graphical Viewer

To visualise the HPCToolKIt profile data on Raijin, you need to login to Raijin with a X display, e.g. using ssh -Y. The detailed sample instruction on Raijin is listed below:   

ssh -Y
module load hpctoolkit
hpcviewer hpctoolkit-<prog.exe>-database-<jobid>

Two different metric is presented: inclusive and exclusive, denoted by “I” and “E” respectively in the metric panel of hpcviewer.   

  • “I” indicates the inclusive measurement: represents the sum of all costs attributed to this call site and any of its descendants.   
  • “E” indicates the exclusive measurements: only represents the sum of all costs  

attributed strictly to this call site.