You are able to use NCI-data-analysis module to manipulate NetCDF files in parallel.

Gadi

Example job script:

#!/bin/bash
 
#PBS -l ncpus=4
#PBS -l mem=16GB
#PBS -l jobfs=20GB
#PBS -q normal
#PBS -P a00
#PBS -l walltime=02:00:00
#PBS -l storage=gdata/dk92+gdata/a00+scratch/a00
#PBS -l wd
  
module use /g/data/fp0/apps/Modules/modulefiles
module load NCI-data-analysis/2022.06

mpirun python3 par_nc4_test.py >& output.log

If you run the below par_nc4_test.py script in the above job script

from mpi4py import MPI
import numpy as np
from netCDF4 import Dataset

comm = MPI.COMM_WORLD # Use the world communicator
mpi_rank = comm.Get_rank() # The process ID
mpi_size = comm.Get_size() # Total amount of ranks

with Dataset('output.nc','w',parallel=True) as f:
  d = f.createDimension('dim',mpi_size)
  v = f.createVariable('var', np.int64, 'dim')
  v[mpi_rank] = mpi_rank

comm.Barrier()

if (mpi_rank == 0):
  print(mpi_size,' MPI ranks have finished writing!')

you will get a NetCDF file named "output.nc" containing outputs from all 4 MPI ranks.

$ ncdump output.nc

netcdf output {
dimensions:
        dim = 4 ;
variables:
       int64 var(dim) ;
data:

 var = 0, 1, 2, 3 ;
}

Page tree

Parallel I/O with netCDF4 and mpi4py

Gadi