Page tree
Skip to end of metadata
Go to start of metadata

R is `GNU S' - A language and environment for statistical computing and graphics. R is similar to the award-winning S system, which was developed at Bell Laboratories by John Chambers et al. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).

R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time.

There is an Australian AARNet mirror of the main R web site.

Usage

First you need to decide on the version of the software you want to use. Use

$ module avail R

to check what versions are available. We normally recommend using the latest version available. For example, to load the 4.0.0 version of R use

$ module load R/4.0.0

For more details on using modules see our modules help guide.

The following procedure will run R under the PBS queueing system. Assume the usual interactive procedure is to start R and input a file called `input.r` containing the sequence of R commands that you wish to execute.

  • Create a batch job script called `r_job.sh` similar to the following example: 

    #!/bin/bash
    
    #PBS -q normal
    #PBS -l walltime=00:02:00,mem=250MB,jobfs=500MB
    #PBS -l wd
    
    # Load modules
    module load R/4.0.0
    
    # Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
    # needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on
    # https://opus.nci.org.au/display/Help/PBS+Directives+Explained.
    
    # Run R applications
    R --vanilla < input.r > output
  • Make sure this job script is executable and the walltime and mem limits are correct.
  • Submit the job by issuing the following on the command line

    $ qsub r_job.sh
  • This will execute the instructions in `input.r` after starting up R and the output that you would expect to see on the desktop for interactive execution will appear in the file output.
  • Check the files `r_job.sh.e****` and `r_job.sh.o****` for any errors and to see the time consumed.
  • Note the request for `/scratch` space in JOBFS as R uses TMPDIR.

This version of R has been built with the Intel MKl library for dense linear algebra BLAS and LAPACK. If your algorithm is heavily dependent on LAPACK routines you may be able to benefit by running in parallel. An example job script follows:

#!/bin/bash

#PBS -q normal
#PBS -l walltime=00:20:00,mem=4GB,ncpus=2
#PBS -l wd

# Load modules
module load R/4.0.0

# Set number of OMP threads
export OMP_NUM_THREADS=$PBS_NCPUS

# Must include `#PBS -l storage=scratch/ab12+gdata/yz98` if the job
# needs access to `/scratch/ab12/` and `/g/data/yz98/`. Details on
# https://opus.nci.org.au/display/Help/PBS+Directives+Explained.

# Run R applications
R --vanilla -f input.r > output

To see if it is worth using multiple cpus you should run some timing tests with 1, 2, 4 up to no more than 16 cpus and check the walltime used. Your problems need to be fairly large to benefit from parallelism.

If you wish to add extra packages such as `randomForest` you need to load appropriate `Intel` modules. We recommend using the same `Intel` compiler version that were used to build R.

The list of modules that were loaded during the R build in the `/apps/R/<version>/README.nci` file. For example, for R/4.0.0, the file is `/apps/R/4.0.0/README.nci`. There you can see that `intel-compiler/2019.5.281` was used. Therefore this is the version that needs to be loaded, as shown below:

$ module load R/4.0.0
$ module load intel-compiler/2019.5.281

$ R
....
>install.packages("randomForest",repos="https://mirror.aarnet.edu.au/pub/CRAN/")
Warning in install.packages("randomForest") :
  ''''''''''''''''''''''''''''''''lib = "/apps/R/4.0.0/lib64/R/library"'''''''''''''''''''''''''''''''' is not writeable
Would you like to create a personal library
''''''''''''''''''''''''''''''''~/R/x86_64-unknown-linux-gnu-library/4.0.0''''''''''''''''''''''''''''''''
to install packages into?  (y/n) y

If you wish to install packages in a different directory from the default `~/R/x86_64-unknown-linux-gnu-library/4.0.0`, you need to set the environment variable `R_LIBS` to the new directory. This will also need to be set every time you use R.

Note, that some packages can not be build with `Intel` compilers. The problem usually happens when a package using complex variables. In such cases, you need to switch to `GNU` compilers. This is done by modifying `.R/Makevars` file in your home directory. Putting the following lines in this file:

CXX=g++
CXX11=g++
CXX14=g++
CC=gcc

will force R to use gcc/g++ instead of icc. Do not forget to comment out these lines (add # symbol in front of each line) after installing that problematic package.