Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Debuggers such as gdb can be used to pinpoint lines where segmentation violations occur. However this may not be the actual point at which the error occured. The difficulty with memory problems is that the error may be reported some time after the incorrect line of code. This requires that the code is compiled with the -g option.
  • For Fortran code it is always worth compiling with the option -check bounds to pinpoint any array subscript or substring references that are out of declared bounds at runtime.
  • Electric Fence can be used to detect errors in C code where the boundaries of a malloc() are overrun or where there is an attempt to touch a memory allocation that has already been freed. Although the Electric Fence documentation will say that it can be used to debug MPI codes this does not work on the AC. It appears that the start-up procedure for the executable linked with -lefence invokes some memory allocations that interfere with the SGI mpirun start-up. Code must be linked with the libefence.a library before being executed.
  • Mudflap is part of the more recent versions of gcc/g++. To use mudflap you need to do at least

    Code Block
    languagebash
    $ gcc prog.c -fmudflap -lmudflap
    $ a.out

    The environment variable MUDFLAP_OPTIONS can be used to control the output from mudflap. See http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging for more details.

  • TotalView (https://opus.nci.org.au/display/Help/TotalView) is installed on Gadi and it can be used to debug sequential, MPI or OpenMP programs written in C, C++ or Fortran. In order to use the memory debugging tools of TotalView, you need to login to Gadi with X11 (X-Windows) forwarding. Add the -Y option for Linux/Mac/Unix to your SSH command to request SSH to forward the X11 connection to your local computer. For Windows, we recommend to use MobaXterm (http: //mobaxterm.mobatek.net) as it automatically uses X11 forwarding. Then you need to recompile your code as follows:

    Code Block
    languagebash
    # Load module, always specify version number.
    $ module load totalview/2020.1.13
    
    $ gfortran -g prog.f -L$TVLIB -ltvheap
    $ gcc -g prog.c -L$TVLIB -ltvheap

    Then you need to start an interactive PBS job as described in https://opus.nci.org.au/display/Help/TotalView. When the interactive job starts, you need to load module totalview/2020.1.13 and start up MemoryScape from TotalView by executing the totalview -classicUI command. It may or may not ask you to click on the `Continue` button. Then click on `Tools -> Open MemoryScape`. This will launch the TotalView MemoryScape window. Click on `Memory Debugging Options` tab and make sure that `Enable memory debugging` is selected and click on any of the options that you wish to investigate.

    Run the code to the point where a memory error occurs. Say, for example, that this is a segmentation violation and the code stops. Then click on the `Memory Reports -> Heap Status` tab and highlight the process you want to look at in the left hand column. You can chose to look at the Source View, Backtrace View, Graphical View or Corrupted Guard Blocks by making the relevant choice on the lower left hand side and clinking `Generate View`. To find the line where a memory error is occuring, one method is to generate the Graphical View then chose the `Backtrace/Source` tab and double click on lines in the Backtrace window. You may need to extend the size of the window to see all the details. For more information, please see the documentation at https://help.totalview.io/classicTV/current/PDFs/Debugging_Memory_Problems_with_MemoryScape.pdf.

  • Intel Inspector: https://software.intel.com/en-us/intel-inspector-xe
  • DrMemory: https://github.com/dynamorio/drmemory
  • GNU mtrace: http://www.gnu.org/software/libc/manual/html_node/Tracing-malloc.html