Page tree

On this page

Overview

TotalView is an all purpose debugger, particularly suitable for MPI, OpenMP and threads code. It is a product of Roguewave Software (https://www.perforce.com/rogue-wave-software).

More information: https://totalview.io/products/totalview

Usage

TotalView can be started in several different ways, depending upon whether you want to:

  • debug an executable file
  • attach to a running process
  • debug a core file
  • recall a past debugging session

Login to Gadi with X11 (X-Windows) forwarding. Add the -Y option for Linux/Mac/Unix to your SSH command to request SSH to forward the X11 connection to your local computer. For Windows, we recommend to use MobaXterm (http://mobaxterm.mobatek.net) as it automatically uses X11 forwarding.

You can check the versions installed in Gadi with a module query:

$ module avail totalview

We normally recommend using the latest version available and always recommend to specify the version number with the module command:

$ module load totalview/2020.1.13

For more details on using modules see our modules help guide at https://opus.nci.org.au/display/Help/Environment+Modules.

Compile the application/program as normal but with the -g option added. Add the -O0 flag with the Intel compiler. We also recommend that you do not run with optimisation turned on, flags such as -fast.

# Load module, always specify version number.
$ module load openmpi/4.0.2

$ mpicc -g -o mpi_program mpi_program.c -L$TVLIB -ltvheap_64

Do not compile your program with optimisation flags while you are debugging it. Compiler optimisations can "rewrite" your program and produce machine code that does not necessarily match your source code.

Start an interactive PBS job with the following command on Gadi. It requests 4 CPU cores, 10 GiB memory, and 30 GiB local disk on a compute node on Gadi from the normal queue for its exclusive access for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. To change the number of CPU cores, memory, or jobfs required, simply modify the appropriate PBS resource requests in the qsub command below according to the information available at https://opus.nci.org.au/display/Help/Queue+Structure. Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs accordingly is required to prevent the compute resource waste.

Also note that you must include -l storage=scratch/ab12+gdata/yz98 to the qsub command below if the job needs access to /scratch/ab12/ and /g/data/yz98/. Details on https://opus.nci.org.au/display/Help/PBS+Directives+Explained.

$ qsub -I -X -P a00 -q normal -l ncpus=4,mem=10GB,jobfs=30GB,walltime=00:30:00,wd

New UI (User Interface) has some limitations. Although HPC functionality from the classic user interface continues to be added to TotalView, these are the major areas of CLI (Command Line Interface) functionality not yet supported by the new UI:

  • Remote debugging
  • Some advanced data debugging such as array manipulation and visualization
  • Memory debugging with MemoryScape

For new TotalView users, new UI is the default. To launch the classic UI if necessary:

  • Change the default Display preference under `File --> Preferences --> Display`, or
  • Add the -classicUI switch after the totalview command, for example:
    totalview -classicUI

When the interactive job starts on Gadi, execute the followings commands:

# Load modules, always specify version number.
$ module load openmpi/4.0.2
$ module load totalview/2020.1.13

$ totalview

A TotalView graphical window will open at this point.

This will allow you to start a debugging session by restoring your last session, adding a new (serial) program, adding a new parallel program, attaching a running program, adding a core or replay recording file or listening for reverse connection.

Adding a new program (serial or parallel) to create a new debugging session only needs the binary executable. However, to attach a running program to create a new debugging session, you first need to run the program like the following way first:

$ mpirun -np $PBS_NCPUS mpi_program

Then you will be able to see the running process and attach that process to the new debugging session. After that, run the program by clicking on the `Go` button, run for a while and stop by clicking on the `Halt` button to debug the program.

Debugging program memory requires to select `Enable memory debugging` in the `DEBUG OPTIONS` step when you are creating a debugging session. When the debugging session is created, you need to click on `Debug → Open MemoryScape`, and a new window will open. Click on `Memory Debugging Options` and select any of the options that you wish to investigate (either from the `Basic Options` or `Advanced Options`). Then select one or more processes from the `Process Selection` window and run them by clicking on the play-shape button to the point where a memory error occurs. Say, for example, that this is a segmentation violation and the code stops. Then click on the `Memory Reports -> Heap Status` tab and highlight the process from the `Process Selection` window you want to look at in the left hand column. You can chose to look at different kinds of views by making the relevant choice on the left hand side. To find the line where a memory error is occuring, one method is to click on the `Backtrace Report` on the left hand side. You may need to extend the size of the window to see all the details.

If you want to use TotalView for memory debugging of an existing program, you need to relink your code with the following options:

-L$TVLIB -ltvheap_64 -Wl,-rpath,$TVLIB 

If a program is crashing and you want to use the debugger for traceback, ensure that you compile with the compile flag -g and unset your shell corefile limit. This should be done before the application/program executes.

# In bash
$ ulimit -c unlimited

# In csh/tcsh
$ limit coredumpsize unlimited

This version of TotalView is licensed with 1024 tokens so can run jobs with up to approximately 1020 processors. We are licensed for the Replay Engine which does reverse debugging.

For more detailed information, see the TotalView user guide available at https://help.totalview.io/previous_releases/2020.1/HTML/index.html#page/TotalView/totalviewlhug-title.html#.