Page tree
Skip to end of metadata
Go to start of metadata
On this page

Overview

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the award-winning S system which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. It can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).

R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time.

There is an Australian AARNet mirror of the main R web site.

More information: https://www.r-project.org/about.html

Usage

You can check the versions installed in Gadi with a module query:

$ module avail R

We normally recommend using the latest version available and always recommend to specify the version number with the module command:

$ module load R/4.1.0

For more details on using modules see our modules help guide at https://opus.nci.org.au/display/Help/Environment+Modules.

An example PBS job submission script named r_job.sh is provided below. It requests 1 CPU core, 2 GiB memory, and 8 GiB local disk on a compute node on Gadi from the normal queue for its exclusive access for 30 minutes against the project a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done.

#!/bin/bash

#PBS -P a00
#PBS -q normal
#PBS -l ncpus=1
#PBS -l mem=2GB
#PBS -l jobfs=8GB
#PBS -l walltime=00:30:00
#PBS -l wd

# Load module, always specify version number.
module load R/4.1.0

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained.

# Run R application
R --vanilla < input.r > output

For more information about R command's options: https://cran.r-project.org/doc/manuals/r-release/R-intro.html

To run the job you would use the PBS command:

$ qsub r_job.sh

This will execute the instructions in input.r after starting up R and the output that you would expect to see on the desktop for interactive execution will appear in the file output. Check the files r_job.sh.e**** and r_job.sh.o**** for any errors and to see the time consumed. Note the request for /scratch space in jobfs as R uses TMPDIR.

Executing R commands in an interactive way is also possible. Please see the details at https://opus.nci.org.au/display/Help/0.+Welcome+to+Gadi#id-0.WelcometoGadi-InteractiveJobs.

This version of R has been built with the Intel MKl library for dense linear algebra BLAS and LAPACK. If your algorithm is heavily dependent on LAPACK routines, you may be able to benefit by running in parallel. An example job script with 2 CPU cores provided below. Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs according to the information available at https://opus.nci.org.au/display/Help/Queue+Structure is required to prevent the compute resource waste.

#!/bin/bash

#PBS -q normal
#PBS -l ncpus=2
#PBS -l mem=4GB
#PBS -l jobfs=16GB
#PBS -l walltime=00:15:00
#PBS -l wd

# Load module, always specify version number.
module load R/4.1.0

# Set number of OMP threads
export OMP_NUM_THREADS=$PBS_NCPUS

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained.

# Run R application
R --vanilla -f input.r > output

To see if it is worth using multiple CPU cores, you should run some timing tests with 1, 2, 4 up to no more than 16 CPU cores and check the walltime used. Your problems need to be fairly large to benefit from parallelism.

If you wish to add extra packages such as randomForest, you need to load appropriate Intel modules. We recommend using the same Intel compiler version that were used to build R.

The list of modules that were loaded during the R build are in the /apps/R/<version>/README.nci file. For example, for R/4.1.0, the file is /apps/R/4.1.0/README.nci. There you can see that intel-compiler/2021.2.0 was used. Therefore this is the version that needs to be loaded, as shown below:

# Unload modules
$ module unload R intel-compiler

# Load modules, always specify version number.
$ module load R/4.1.0
$ module load intel-compiler/2021.2.0

$ R
....
> install.packages("randomForest",repos="https://mirror.aarnet.edu.au/pub/CRAN/")
Warning in install.packages("randomForest") :
  ''''''''''''''''''''''''''''''''lib = "/apps/R/4.1.0/lib64/R/library"'''''''''''''''''''''''''''''''' is not writeable
Would you like to create a personal library
''''''''''''''''''''''''''''''''~/R/x86_64-unknown-linux-gnu-library/4.1.0''''''''''''''''''''''''''''''''
to install packages into?  (y/n) y

If you wish to install packages in a different directory from the default ~/R/x86_64-unknown-linux-gnu-library/4.1.0, you need to set the environment variable R_LIBS to the new directory. For bash, you will be able to set it using



$ export R_LIBS=/path/to/your/new/directory:$R_LIBS


command. This will also need to be set every time you use R.

Note, that some packages can not be build with Intel compilers. The problem usually happens when a package using complex variables. In such cases, you need to switch to GNU compilers. This is done by modifying ~/.R/Makevars file in your $HOME directory. Putting the following lines in this file:

CXX=g++
CXX11=g++
CXX14=g++
CC=gcc

will force R to use gcc/g++ instead of icc. Do not forget to comment out these lines (i.e. add # symbol in front of each line) after installing that problematic package.