In the NCI ARE JupyterLab session, direct viewing of TensorBoard within the Notebook is not supported due to an issue in the current Open OnDemand architecture design. However, you can use a workaround by accessing it via SSH port forwarding.
To demonstrate it, you can set up a MNIST workflow in a Jupyter notebook as described below:
# Load the TensorBoard notebook extension %load_ext tensorboard import tensorflow as tf import datetime import os # Change to your working directory. os.chdir("/g/data/z00/rxy900/PROJECT/NCI-MLENV/tensorboard_test") # Load MNIST dataset from /g/data/wb00 file system. mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data(path="/g/data/wb00/admin/staging/MNIST/npz/mnist.npz") x_train, x_test = x_train / 255.0, x_test / 255.0 # define the model. def create_model(): return tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) # Create the dataset for training and test. train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)) test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)) train_dataset = train_dataset.shuffle(60000).batch(64) test_dataset = test_dataset.batch(64) loss_object = tf.keras.losses.SparseCategoricalCrossentropy() optimizer = tf.keras.optimizers.Adam() # Define our metrics train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32) train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('train_accuracy') test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32) test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('test_accuracy') def train_step(model, optimizer, x_train, y_train): with tf.GradientTape() as tape: predictions = model(x_train, training=True) loss = loss_object(y_train, predictions) grads = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables)) train_loss(loss) train_accuracy(y_train, predictions) def test_step(model, x_test, y_test): predictions = model(x_test) loss = loss_object(y_test, predictions) test_loss(loss) test_accuracy(y_test, predictions) current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") train_log_dir = 'logs/gradient_tape/' + current_time + '/train' test_log_dir = 'logs/gradient_tape/' + current_time + '/test' train_summary_writer = tf.summary.create_file_writer(train_log_dir) test_summary_writer = tf.summary.create_file_writer(test_log_dir) model = create_model() # reset our model EPOCHS = 5 for epoch in range(EPOCHS): for (x_train, y_train) in train_dataset: train_step(model, optimizer, x_train, y_train) with train_summary_writer.as_default(): tf.summary.scalar('loss', train_loss.result(), step=epoch) tf.summary.scalar('accuracy', train_accuracy.result(), step=epoch) for (x_test, y_test) in test_dataset: test_step(model, x_test, y_test) with test_summary_writer.as_default(): tf.summary.scalar('loss', test_loss.result(), step=epoch) tf.summary.scalar('accuracy', test_accuracy.result(), step=epoch) template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}' print (template.format(epoch+1, train_loss.result(), train_accuracy.result()*100, test_loss.result(), test_accuracy.result()*100)) # Reset metrics every epoch train_loss.reset_states() test_loss.reset_states() train_accuracy.reset_states() test_accuracy.reset_states()
Running the above script will give the following outputs
Epoch 1, Loss: 0.2446601688861847, Accuracy: 92.78666687011719, Test Loss: 0.11248774081468582, Test Accuracy: 96.54000091552734 |
Next, you can run the following bash command to open the TensorBoard interface.
%tensorboard --logdir logs/gradient_tape
However, on ARE JupyterLab session, you can not view TensorBoard interface. This is because the port number used by the TensorBoard interface is blocked in ARE due to he architecture design of Open Ondemand.
Alternatively, you could manually set up the ssh port forwarding to view the TensorBoard dashboard as below.
First of all, open a CMD terminal by click "+" button and then Terminal button.
Then enter your working directory and start the tensor board command as below:
NCI-ai-ml_22.08 > tensorboard --logdir 'logs/gradient_tape' --host `hostname` NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: https://github.com/tensorflow/tensorboard/issues/4784 TensorBoard 2.8.0 at http://gadi-gpu-v100-0112.gadi.nci.org.au:6006/ (Press CTRL+C to quit)
Copy the current Gadi node hostname&port, i.e. "gadi-gpu-v100-0112.gadi.nci.org.au:6006", as shown in the above output.
Open a CMD terminal in your desktop and type in the command as below (with the Gadi worker node name&port you just copied):
client_desktop:$ ssh -N -L 6006:gadi-gpu-v100-0112.gadi.nci.org.au:6006 rxy900@gadi.nci.org.au
Now open your web browser and type in the address as below
http://localhost:6006
You should be able to open the TensorBoard interface as below