Page tree

At NCI, we provide researchers with profilers so that they can improve their code performance with a clear guidance. Here is a summary in the table below.

profilersCPUGPUMPItimeline  high level performance summary
PProf.jlx

source code
Intel-Vtunex
xfunctionx
NVidia Nsight Systems
xxfunctionx
NVida Nsight Compute
xxsource codex

We tested four profiling tools and show the corresponding examples below.

PProf.jl

This tool shows the information collected by the native Julia profiler through a graphic user interface. One can navigate through the profiling result by the statistics, flamegraph, and calling graph view and use the search bar to find any piece of code under interest.

In your code running on Gadi, profile the function under interest like the following.  It generates the file test.pb.gz in the working directory.

using Profile, PProf
...
Profile.init()
@profile function_under_interest()
pprof(web=false,out="test.pb.gz")

 To visualise the profiling result, download the file test.pb.gz to your local computer and install the PProf package [https://github.com/JuliaPerf/PProf.jl] before running the following in Julia

using PProf
PProf.refresh(webhost = "localhost", webport = 34512, file = "test.pb.gz",ui_relative_percentages =true)

Open a browser and the result should be available at "http://localhost:34512".  

Intel VTune

This tool is capable of summarising the very high level performance overview. It works with Julia code running on multiple CPU cores and can report on MPI communication patterns if MPI.jl is used in the code. The example below generates application performance snapshot [https://www.intel.com/content/www/us/en/docs/vtune-profiler/get-started-application-snapshot/2021-3/overview.html] with the default name aps_report_yyyymmdd_hhmmss.html and a corresponding report directory aps_result_yyyymmdd where yyyymmdd and hhmmss are the timestamps.

module load intel-vtune/2023.0.0
export ENABLE_JITPROFILING=1
aps julia intelvtune.jl

If your Julia code is launched through mpirun, try

mpirun -np $PBS_NCPUS aps julia intelvtune_mpi.jl

after the job, you can then generate the report file by passing the report directory to the aps command like "aps --report aps_result_yyyymmdd".

NVIDIA Nsight Systems

When diagnosing performance issue like bottlenecks, Nsight system can be very useful. It traces julia code running on multiple CPU cores and GPUs across multiple nodes.  The following example profiles the julia code and generates

module load nvidia-hpc-sdk/22.11
module load cuda/11.7.0
nsys_cmd="nsys profile -w true -t cuda,osrt,nvtx,cudnn,cublas -s cpu -f true --cudabacktrace=all -x true"
$nsys_cmd -o nsys.$PBS_JOBID.out julia prof_test.jl

If your Julia code is launched through mpirun, try

nsys_cmd="nsys profile -w true -t cuda,osrt,nvtx,cudnn,cublas -s cpu -f true --cudabacktrace=all -x true"
$nsys_cmd -o nsys.$PBS_JOBID.out mpirun -np julia prof_test.jl


NVIDIA Nsight Compute

When improving kernel performance on GPUs, Nsight Compute can be very useful.

We assume Julia is installed in the directory $JULIA_HOME, for example, the julia binary is in the directory $JULIA_HOME/bin/, in the code below. It generates the result file ncureport.ncu-rep in the job working directory $PBS_O_WORKDIR. Once you have the result file, download the file and launch the NVIDIA Nsight Compute application on your local computer to go through the profiling result. 

module load nvidia-hpc-sdk/22.11
module load cuda/11.7.0

ofile=$PBS_O_WORKDIR/ncureport
export LD_LIBRARY_PATH=$JULIA_HOME/lib/julia/:$LD_LIBRARY_PATH
ncu --set full -k regex:kernel --target-processes all -o $ofile julia prof_test.jl







  • No labels