Page tree

Rmpi is an R package providing an interface to MPI (Message-Passing Interface) API calls with interactive R slave functionalities.

NCI-data-analysis module enables executing Rmpi scripts across multiple worker nodes at Gadi. The current version of Rmpi in NCI-data-analysis/2022.06 module is v0.6-9.2. 


You could submit a PBS job to run Rmpi in Gadi. The example job script is show below (make sure "gdata/dk92" is included in your storage request) 

#PBS -l ncpus=4
#PBS -l mem=16GB
#PBS -l jobfs=20GB
#PBS -q normal
#PBS -P a00
#PBS -l walltime=00:30:00
#PBS -l storage=gdata/dk92+gdata/a00+scratch/a00
#PBS -l wd

module use /g/data/dk92/apps/Modules/modulefiles
module load NCI-data-analysis/2022.06

mpirun -np 1 Rscript Rmpi_test.R >& output.log

Note: you should always specify "-np 1" in the above script and Rmpi will use it as the master process which will spawn "MPI_COMM_SIZE-1" child processes.

The example script using Rmpi, i.e. Rmpi_test.R is given below

# Load Rmpi library
ns <- mpi.universe.size() - 1

# Tell all slaves to identify themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

# Test computations
x <- 5
x <- mpi.remote.exec(rnorm, x)

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves(dellog = FALSE)

The above script firstly initialises MPI. It fetches the total number of MPI ranks via the function mpi_universe.size(). Then it invokes the function mpi.spawn.Rslaves() to spawn multiple slave processes, i.e. MPI_COMM_SIZE-1, to utilise the whole MPI scale.

Next, the master process tell each slave process to identify themselves with their ID and total MPI size.

 After that, the master process drives each slave to generates a vector of normally distributed random numbers in a size 5 via a Rmpi function 'mpi.remote.exec' and a built-in function 'norm'. The returned data.frame in the size of 5* (mpi.universe.size-1) is printed.

Finally the  it close all slave processes and quit the MPI.

The outputs from the above job script and example R script are shown below.

3 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 4 is running on: gadi-cpu-clx-0740
slave1 (rank 1, comm 1) of size 4 is running on: gadi-cpu-clx-0740
slave2 (rank 2, comm 1) of size 4 is running on: gadi-cpu-clx-0740
slave3 (rank 3, comm 1) of size 4 is running on: gadi-cpu-clx-0740
[1] "I am 1 of 4"

[1] "I am 2 of 4"

[1] "I am 3 of 4"
[1] 3
          X1          X2         X3
1 -1.1505491  0.83828520 -0.8191266
2 -0.2950300 -0.35869354 -0.8578121
3  2.0825411  0.68093567  1.0914607
4  0.4254675  0.93799590 -1.3301995
5 -0.9015812 -0.01677674 -1.2268423
[1] 1


You can also run the above example script directly from a command line within a ARE JupyterLab session

mpirun -np 1 Rscript Rmpi_test.R >& output.log

However, it can only utilise resources within a single node. If you need to run large scale computations by using multiple nodes, please submit a batch job at Gadi as described above.

  • No labels