Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Debugging for Parallel MPI programs is much different from transitional memory debug for serial programs. This is because the non-deterministic nature of all parallel programs. Therefore, it is essential to obtain knowledge of both message queues for debugging MPI programs and stack trace for memory examination.   

On NCI NF compute systems, there are two debug/inspection tools installed, PADBPadb and TotalViewTotalview. This document describes how to use these parallel program inspection/debug tools. For further help with using performance profilers and tracers, please send email to help@nci.org.au. 


PADB

...

Table of Contents

Padb

Padb (Parallel Application Debugger) is a Job Inspection Tool for examining and debugging parallel programs, primarily it simplifies the process of gathering stack traces on compute clusters however it also supports a wide range of other functions. Padb supports a number of parallel environments and it works out-of-the-box on the majority of clusters. It’s an open source, non-interactive, command line, script-able tool intended for use by programmers and system administrators alike.   

Current latest version is padb/3.  

USAGE

COMMON USAGE

3. Type the following to load it:

Code Block
languagebash
themeRDark
module load padb/3.3

Usage

Common Usage

Show current active jobs under

...

PBS:

Code Block
languagebash
themeRDark
padb --show-jobs

Target a specific jobid, and reports is process

...

state:

Code Block
languagebash
themeRDark
padb <jobid> --proc-summary

...

Target a specific jobid, and report its MPI message queue, stack traceback, etc.

...

Code Block
languagebash
themeRDark
padb --full-report=<jobid>

...

Stack Trace

Target a specific jobid, and report its stack trace for a given MPI process (rank)

...

:

Code Block
languagebash
themeRDark
padb <jobid> --stack-trace --tree --rank <MPI rank id>

Target a specific jobid, and report its stack trace including information about parameters and local variables for a given MPI process (rank)

...

:

Code Block
languagebash
themeRDark
padb <jobid> --stack-trace --tree --rank <MPI rank id> -

...

O stack-shows-locals=1 -

...

O stack-shows-params=1

MPI

...

Message Queue

Target a specific jobid, and reports its MPI message

...

queues:

Code Block
languagebash
themeRDark
padb <jobid> --mpi-queue

...

Process Progress Watch

Target a specific jobid, and report its MPI process progress over a period of

...

time:

Code Block
languagebash
themeRDark
padb <jobid> --mpi-watch --watch -

...

O watch-clears-screen=no

For more detailed usage please refer to PADB’s “Mode of operation” web page, http://padb.pittman.org.uk/modes.html, or PADB’s help information   

...

:

Code Block
languagebash
themeRDark
padb -h

...

Totalview

Totalview can be used to debug parallel MPI or OpenMP programs. Introductory information and userguides on using Totalview are available from this site  

First

...

load module to use Totalview:

Code Block
languagebash
themeRDark
module load totalview

Compile code with the -g option. For example, for an MPI program

...

:

Code Block
languagebash
themeRDark
mpif90 -g prog.f90

Start Totalview. For example, to debug an MPI program using 4

...

MPI processes:

Code Block
languagebash
themeRDark
mpirun --debug -np 4 ./a.out

Note that to ensure that Totalview can obtain information on all variables compile with no optimisation. This is the default if -g is used with no specific optimisation level.   

Totalview shows source code for mpirun when it first starts an MPI job. A GUI like the following is generated. Click on GO and all the processes will start up and you will be asked if you want to stop the parallel job. At this point click YES if you want to insert breakpoints. The source code will be shown and you can click on any lines where you wish to stop.   

Image Added

If your source code is in a different directory from where you fired up Totalview you may need to add the path to Search Path under the File Menu. Right clicking on any subroutine name will “dive” into the source code for that routine and break points can be set there.   

...