You are able to use NCI-data-analysis module to manipulate HDF5 files in parallel.
Gadi
Example job script:
#!/bin/bash #PBS -l ncpus=4 #PBS -l mem=16GB #PBS -l jobfs=20GB #PBS -q normal #PBS -P a00 #PBS -l walltime=02:00:00 #PBS -l storage=gdata/dk92+gdata/a00+scratch/a00 #PBS -l wd module use /g/data/fp0/apps/Modules/modulefiles module load NCI-data-analysis/2022.06
mpirun python3 par_h5py_test.py >& output.log |
If your par_h5py_test.py is written as below
from mpi4py import MPI import h5py
comm = MPI.COMM_WORLD # Use the world communicator mpi_rank = comm.Get_rank() # The process ID mpi_size = comm.Get_size() # Total amount of ranks
with h5py.File('output.h5', 'w', driver='mpio', comm=MPI.COMM_WORLD) as f: dset = f.create_dataset('test', (mpi_size,), dtype='I') dset[mpi_rank] = mpi_rank
comm.Barrier()
if (mpi_rank == 0): print(mpi_size,' MPI ranks have finished writing!') |
You will get a HDF5 file named "output.h5" containing outputs from all 4 MPI ranks.
$ h5dump output.h5 HDF5 "output.h5" { GROUP "/" { DATASET "test" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4 ) / ( 4 ) } DATA { (0): 0, 1, 2, 3 } } } } |