Specialised Environments

Page tree

This page provides an introduction to the training and inferencing of an InversionNet model with the FlatVel-A dataset and how to visualise the seismic data and velocity maps. It has been adapted from the official OpenFWI documentation.

Create your working space

You could copy the ready-to-use archive from /g/data/up99/sandbox/openFWI/repo to your working space as below.

$ cd YOUR_WORKING_DIRECTORY
$ cp -r /g/data/up99/sandbox/openFWI/repo openFWI
$ cd OpenFWI

OpenFWI training in a batch job

In your working space, you can submit a PBS job with GPU resources to run the OpenFWI training process.

Within the job script, you would need to load in the following module:

$ module use /g/data/up99/modulefiles 
$ module load openfwi/23.11

Then you can run the training process via train.py with the appropriate configuration flags. For example, you could run:

python train.py -ds flatvel-a -n nci_v100_eb-40 -j 12 --print-freq 20 -eb 40 -nb 1  -m InversionNet -t flatvel_a_train.txt -v flatvel_a_val.txt

The flags used in the command above are listed below:

'-ds': dataset name;
'-n' : 'folder name for this experiment
'-j' : number of data loading workers (default: 16)
'--print-freq':print frequency
'-eb': epochs in a saved block
'-nb': number of saved block
'-m' : inverse model name
'-t' : name of train anno  
'-v' : name of val anno 

The above command will run the training process with the following configuration settings. Those settings undefined by user arguments are filled with their default values:

device : cuda
dataset : flatvel-a
file_size : None
anno_path : split_files
train_anno : split_files/flatvel_a_train.txt
val_anno : split_files/flatvel_a_val.txt
output_path : Invnet_models/nci_v100_eb-40/
log_path : Invnet_models/nci_v100_eb-40/
save_name : nci_v100_eb-40
suffix : None
model : InversionNet
up_mode : None
sample_spatial : 1.0
sample_temporal : 1
batch_size : 256
lr : 0.0001
lr_milestones : []
momentum : 0.9
weight_decay : 0.0001
lr_gamma : 0.1
lr_warmup_epochs : 0
epoch_block : 40
num_block : 1
workers : 12
k : 1
print_freq : 20
resume : None
start_epoch : 0
lambda_g1v : 1.0
lambda_g2v : 1.0
sync_bn : False
world_size : 1
dist_url : env://
tensorboard : False
epochs : 40
distributed : False

An example OpenFWI training PBS job script (training.sh) is given below:

PBS job script
#!/bin/bash
 
#PBS -q gpuvolta
#PBS -P <insert_compute_project_code>
#PBS -l ngpus=1
#PBS -l ncpus=12
#PBS -l mem=190GB
#PBS -l jobfs=100GB
#PBS -l walltime=01:00:00
#PBS -l storage=gdata/up99+<insert_gdata_or_scratch_project_code_that_contains_the_OpenFWI_scripts>
#PBS -l wd
#PBS -N OpenFWI_training
 
module use /g/data/up99/modulefiles 
module load openfwi/23.11

cd $PBS_O_WORKDIR   ### or cd to where the OpenFWI repo is located (that contains the files train.py, test.py, training.sh, etc...).
 
python train.py -ds flatvel-a -n nci_v100_eb-40 -j 12 --print-freq 20 -eb 40 -nb 1  -m InversionNet -t flatvel_a_train.txt -v flatvel_a_val.txt

To submit training.sh:

$ qsub training.sh

After the training process finishes, you will see a checkpoint file and a model file created under the directory "Invnet_models/nci_v100_eb-40". This is the directory we specified to store the training results. Now you are ready to conduct the testing process.

OpenFWI testing in a batch job

After running the training process to produce the pertaining model, you can run the testing process to check how the trained network performs on the training data and compare it with the ground truth results.

You can put the following command into a PBS job with GPU resources:

python test.py -ds flatvel-a -j 12 -n nci_v100_eb-40 -m InversionNet -v flatvel_a_val.txt -r checkpoint.pth --vis -vb 2 -vsa 3

The flags have the same meaning as those in train.py. Make sure the folder name is consistent between the training and testing processes. The 3 new flags are listed below:

'–vis': visualization option
'-vb': number of batch to be visualized
'-vsa': number of samples in a batch to be visualized

An example OpenFWI testing PBS job script (testing.sh) is provided below:

PBS job script
#!/bin/bash
 
#PBS -q gpuvolta
#PBS -P <insert_compute_project_code>
#PBS -l ngpus=1
#PBS -l ncpus=12
#PBS -l mem=50GB
#PBS -l jobfs=100GB
#PBS -l walltime=00:02:00
#PBS -l storage=gdata/up99+<insert_gdata_or_scratch_project_code_that_contains_the_OpenFWI_scripts>
#PBS -l wd
#PBS -N OpenFWI_testing
 
module use /g/data/up99/modulefiles 
module load openfwi/23.11 
cd $PBS_O_WORKDIR
 
python test.py -ds flatvel-a -j 12 -n nci_v100_eb-40 -m InversionNet -v flatvel_a_val.txt -r checkpoint.pth --vis -vb 2 -vsa 3


To submit testing.sh:

$ qsub testing.sh

After running the testing process, you will find several figure files under the folder "/

OpenFWI inference in JupyterLab

If you've set up your working environment by copying the ready-to-use archive from the 'project up99' directory on Gadi, you'll find a Jupyter notebook named 'inference.ipynb.' This notebook, prepared by NCI, is designed for visualizing prediction results.

The notebook already includes pretrained checkpoint files located in the 'Invnet_models/nci_v100_eb-40' directory, enabling direct execution of inference tasks. Additionally, you have the option to execute the training process, as demonstrated in the section above, to update the provided pretrained checkpoint files.

To run this notebook, launch a JupyterLab session on the Australian Research Environment (ARE) available at https://are.nci.org.au. For further details about ARE, please refer to the ARE User Guide.

To run the inference notebook, let's request the following in a ARE JupyterLab session

Walltime (hours): 2
Queue: gpuvolta
Compute Size: 1gpu
Project: <insert a compute project code that you are a member of>
Storage: gdata/up99+<insert gdata or scratch project code that contains the inference.ipynb notebook>

Under the “Advanced options…” tab, add in the following Module:

Module directories: /g/data/up99/modulefiles
Modules: openfwi/23.11


Now click the “Launch” tab and your JupyterLab job will join the queue:

After starting JupyterLab, navigate to the 'Open JupyterLab' tab.

In the left filesystem pane of your JupyterLab session, locate the directory where the 'inference.ipynb' Jupyter notebook is located. For instance, if you copied it to your home directory, navigate to the 'home' folder.

Launch and execute the OpenFWI inference notebook by opening the 'inference.ipynb' file.

In this notebook, you can choose to run from a clean test dataset by setting the args "-ds flatvel-a -j 12 -n nci_v100_eb-40 -m InversionNet -v flatvel_a_val.txt -r checkpoint.pth --vis -vb 2 -vsa 3"

It will plot figures comparing the "Prediction" and "Ground Truth" velocity:


You can also choose to add in Gaussian noise to the test datasets by using the "-- missing" and "--std" flags in args. For example:

"-ds flatvel-a -j 12 -n nci_v100_eb-40 -m InversionNet -v flatvel_a_val.txt -r checkpoint.pth --missing 1 --std 0.001 --vis -vb 2 -vsa 3".


In this case, the notebook will plot 5 channels of the seismic data with:

  1. the Gaussian noise added in the "Prediction".
  2. the original "Clean" seismic datasets.
  3. their difference, i.e. the noise itself.


The notebook will also produce the "Prediction" and "Ground Truth" velocity models. The left side velocity model is created from the model with the input of test datasets with additive Gaussian noise (i.e., the left side 5 channels from the figure above):

  • No labels