Introduction

PyTorch 1.8 introduces an enhanced profiler API that can record both CPU-side operations and CUDA kernel launches on the GPU side. This profiler enables performance analysis and provides insights into performance bottlenecks. The recorded information can be visualised using the TensorBoard Plugin, allowing for detailed analysis and optimisation of PyTorch models.

This tutorial demonstrates how to use the TensorBoard plugin for a project on NCI's Gadi. Currently, the Open OnDemand architecture design does not support viewing the TensorBoard dashboard directly on an NCI ARE JupyterLab session. There are two methods to view the TensorBoard dashboard, by either downloading the trace files locally or via ssh port forwarding. Both of these ways are explained in detail below.

Setup

Install Dependencies

Make sure you have PyTorch 1.8 or higher installed. To install torch and torchvision , use the following command:

pip install torch torchvision

To install the PyTorch Profiler TensorBoard Plugin, use the following command:

pip install torch_tb_profiler

Import Dependencies

Import the necessary PyTorch libraries.

import torch
import torch.profiler
# Other dependencies

Model Training

Setup the data and train the machine learning model as you normally would.

def train(data):
	# Model Training Steps

Profile events

We enable the profiler through the context manager. The profiler accepts several parameters, of which some of the most useful are:

schedule : A callable that determines the profiler action at each input step(int). The wait parameter describes the number of initial steps to skip. During the warmup steps, the profiler starts tracing, but the results are discarded. During active steps, the profiler works and records events. The repeat parameter describes the number of times to repeat the profiling cycle.
on_trace_ready : A callable called at the end of each cycle to handle the profiling results and generate result files for TensorBoard. We may specify the output location in the trace handler.
record_shapes : Option to record the shapes of operator inputs.
profile_memory : Option to track tensor memory allocation and deallocation (disable for older PyTorch versions for improved profiling time).
with_stack : Option to record source information (file and line number) for the operations. Can be used with TensorBoard in VS Code for easier code navigation.

Please note that the example provided below uses `torch.profiler.tensorboard_trace_handler` to generate result files for TensorBoard. The results will be saved in the `./log/model` directory, which can be specified as the `logdir` parameter in TensorBoard for analysis.

with torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/model'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        train(batch_data)
        prof.step()  # Need to call this at the end of each step to notify profiler of steps' boundary.

Alternatively, you can also use the non-context manager approach with start() and stop() methods to control the profiler. This allows you to manually start and stop the profiling process instead of using a context manager.

prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/model'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)

prof.start()
for step, batch_data in enumerate(train_loader):
    train(batch_data)
    prof.step()
prof.stop()

Use TensorBoard to Visualise Results

As mentioned above, the Open OnDemand architecture design does not support viewing the TensorBoard dashboard directly on an NCI ARE JupyterLab session. There are two methods to view the TensorBoard dashboard, by either downloading the trace files locally or via ssh port forwarding.

Download Results Locally

If you decide to download the log directory locally, then you may directly launch TensorBoard using the following command in the same directory as the log folder:

tensorboard --logdir './log'

Use SSH port forwarding (ARE Session)

In order to use the NCI ARE server to view TensorBoard, we must setup port forwarding to view the results remotely.

Open terminal in ARE session

Run the following command to setup port forwarding:

tensorboard --logdir './log' --bind_all

You should see a response similar to this:

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784
TensorBoard 2.8.0 at http://gadi-gpu-v100-0112.gadi.nci.org.au:6006/ (Press CTRL+C to quit)

Copy the host address and port http://gadi-gpu-v100-0112.gadi.nci.org.au:6006/. Open a CMD terminal in your local machine and use the following command:
```
ssh -N -L 6006:gadi-gpu-v100-0112.gadi.nci.org.au:6006 abc000@gadi.nci.org.au
```
Replace abc000 with your user ID.

Open TensorBoard URL

Open the TensorBoard profile URL in Google Chrome browser or Microsoft Edge browser using the localhost address:

http://localhost:6006/#pytorch_profiler

You should see Profiler plugin Overview page as shown below:

A brief description of some of the views is presented below:

Overview: Shows a high level summary of the model performance
Operator View: Shows the performance of every PyTorch operator that is executed either on the host or device
GPU Kernel View: Shows the time spent by all the kernels on the GPU
Trace View: Shows timeline of profiled operators and GPU kernels
Memory View: Shows all memory allocation/release events and allocator’s internal state during profiling

Page tree

PyTorch Profiler with TensorBoard