You are able to use NCI-data-analysis module to manipulate HDF5 files in parallel.

Gadi

Example job script:

#!/bin/bash
 
#PBS -l ncpus=4
#PBS -l mem=16GB
#PBS -l jobfs=20GB
#PBS -q normal
#PBS -P a00
#PBS -l walltime=02:00:00
#PBS -l storage=gdata/dk92+gdata/a00+scratch/a00
#PBS -l wd
  
module use /g/data/fp0/apps/Modules/modulefiles
module load NCI-data-analysis/2022.06

mpirun python3 par_h5py_test.py >& output.log

If your par_h5py_test.py is written as below

from mpi4py import MPI
import h5py

comm = MPI.COMM_WORLD # Use the world communicator
mpi_rank = comm.Get_rank() # The process ID
mpi_size = comm.Get_size() # Total amount of ranks

with h5py.File('output.h5', 'w', driver='mpio', comm=MPI.COMM_WORLD) as f:
    dset = f.create_dataset('test', (mpi_size,), dtype='I')
    dset[mpi_rank] = mpi_rank

comm.Barrier()

if (mpi_rank == 0):
print(mpi_size,' MPI ranks have finished writing!')

You will get a HDF5 file named "output.h5" containing outputs from all 4 MPI ranks.

$ h5dump output.h5
HDF5 "output.h5" {
GROUP "/" {
  DATASET "test" {
    DATATYPE H5T_STD_I32LE
    DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
    DATA {
    (0): 0, 1, 2, 3
    }
  }
}
}

Page tree

Parallel I/O with h5py and mpi4py

Gadi