Note, NCI arm license allows users to arm-reports up to 2048 cpus and arm-forge up to 128 cpus.
To produce a performance report like the ones below, simply replace
mpiexec, and add
perf-report in the front.
A sample batch job for running parallel MPI batch job is as follows:
At the end of job completion, you will see two outputs, txt and html file with names like
Normally you run arm Performance Reports simply by putting
perf-report in front of the command you wish to measure, but for some programs like “bowtie2” is actually a perl script that calls several different programs. Before running the alignment we just edit the “bowtie2” script and add
perf-report to the command that it runs:
my $cmd = "$align_prog$debug_str --wrapper basic-0 ".join(" ", @bt2_args);
my $cmd = "perf-report $align_prog$debug_str …
More detailed Performance Reports user guide can be found here:
You can also see performance reports with examples on various application characterisations here:
With arm-forge, you can debug with arm DDT or profile with arm MAP.
You will need to have X11 forwarding enabled when login to raijin in order to use arm DDT or MAP, and make sure to submit an interactive job with the flag "-X" which allows X11 forwarding to the compute node e.g.
Make sure to submit a job with some jobfs disk request as all the arm tools will create lots of output in your jobfs.
In order to use DDT or MAP, code must be compiled with the debug -g option. Add the -O0 flag with the Intel compiler. We also recommend that you do not run with optimisation turned on, flags such as -fast.
Make sure to replace
mpiexec in the command line to run your MPI program, otherwise you will see error messages like:
The target program encountered an error before it initialised the MPI environment. Thread -1 exited. Check arm MAP is using the correct MPI implementation for your system and can start programs without arm MAP. If you contact firstname.lastname@example.org we'll be happy to help you further.
DDT is a parallel debugger which can be run with up to 128 processors on raijin. It can be used to debug serial, OpenMP, MPI codes.
Totalview users will find DDT has very similar functionality and an intuitive user interface. All of the primary parallel debugging features from Totalview are available with DDT.
To launch the debugger with the dot command followed by the name of the executable to debug:
To debug Python scripts, start the Python interpreter that will execute the script under DDT. To get line level resolution, rather than function level resolution, you must also insert
%allinea_python_debug% before your script when passing arguments to Python. For example:
Open the 'Stacks' view and select a Python frame to see the Python local variables
Note: DDT does not search in your PATH when launching executables, so you must specify the full path to Python.
More detailed DDT user guide can be found here: https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge/ddt
To collect performance data, MAP uses two small libraries: MAP sampler (map-sampler) and MPI wrapper (map-sampler-pmpi) libraries. These must be used with your program. There are somewhat strict rules regarding linking order among object codes and these libraries (please read the User Guide for detailed information). But if you follow the instructions printed by MAP utility scripts, then it is very likely your code will run with MAP.
Before you rebuild your command, you have to load the arm-forge module, and then recompile your program with the -g option to keep debugging symbols, together with optimization flags that you would normally use, this will build a dynamically-linked executable
More detailed MAP user guide can be found here: https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge/map
For UM users