R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the award-winning S system which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. It can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks,
Fortran code can be linked and called at run time.
More information: https://www.r-project.org/about.html
You can check the versions installed in Gadi with a
We normally recommend using the latest version available and always recommend to specify the version number with the
For more details on using modules see our modules help guide at https://opus.nci.org.au/display/Help/Environment+Modules.
An example PBS job submission script named r
_job.sh is provided below. It requests 1 CPU core, 2 GiB memory, and 8 GiB local disk on a compute node on Gadi from the
normal queue for its exclusive access for 30 minutes against the project
a00. It also requests the system to enter the working directory once the job is started. This script should be saved in the working directory from which the analysis will be done.
For more information about
R command's options: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
To run the job you would use the PBS command:
This will execute the instructions in
input.r after starting up R and the output that you would expect to see on the desktop for interactive execution will appear in the file
output. Check the files
r_job.sh.o**** for any errors and to see the time consumed. Note the request for
/scratch space in
jobfs as R uses
Executing R commands in an interactive way is also possible. Please see the details at https://opus.nci.org.au/display/Help/0.+Welcome+to+Gadi#id-0.WelcometoGadi-InteractiveJobs.
This version of R has been built with the
Intel MKl library for dense linear algebra
LAPACK. If your algorithm is heavily dependent on
LAPACK routines, you may be able to benefit by running in parallel. An example job script with 2 CPU cores provided below. Note that if your application does not work in parallel, setting the number of CPU cores to 1 and changing the memory and jobfs according to the information available at https://opus.nci.org.au/display/Help/Queue+Structure is required to prevent the compute resource waste.
To see if it is worth using multiple CPU cores, you should run some timing tests with 1, 2, 4 up to no more than 16 CPU cores and check the walltime used. Your problems need to be fairly large to benefit from parallelism.
If you wish to add extra packages such as
randomForest, you need to load appropriate
Intel modules. We recommend using the same
Intel compiler version that were used to build R.
The list of modules that were loaded during the R build are in the
/apps/R/<version>/README.nci file. For example, for
R/4.1.0, the file is
/apps/R/4.1.0/README.nci. There you can see that
intel-compiler/2021.2.0 was used. Therefore this is the version that needs to be loaded, as shown below:
If you wish to install packages in a different directory from the default
~/R/x86_64-unknown-linux-gnu-library/4.1.0, you need to set the environment variable
R_LIBS to the new directory. For
bash, you will be able to set it using
command. This will also need to be set every time you use R.
Note, that some packages can not be build with Intel compilers. The problem usually happens when a package using complex variables. In such cases, you need to switch to GNU compilers. This is done by modifying
~/.R/Makevars file in your $HOME directory. Putting the following lines in this file:
will force R to use
gcc/g++ instead of
icc. Do not forget to comment out these lines (i.e. add
# symbol in front of each line) after installing that problematic package.