Page tree

On this page

Purpose

This tool is used to check a given singularity container image (.sif file) for any issues which may prevent it from running as expected on Gadi.

Usage

$ /opt/nci/bin/singularity-check-container -h
usage: singularity-check-container [-h] -f SIF_FILENAME [-l] [-v]

optional arguments:
  -h, --help            show this help message and exit
  -f SIF_FILENAME, --sif_filename SIF_FILENAME
                        Path to singularity file
  -l, --list            Unsquash and generate file listing only
  -v, --verbose         Verbosity level -v/-vv/-vvv

Description

This tool scans the container's filesystem for potential issues. Identified issues are categorized into the below sections (second column seen in the examples below):

  • OS - OS version details are captured in this section
  • FS Mount - This section includes any mount points in the container's filesystem which could potentially be overwritten since they share the same name as NCI filesystem mounts.
  • GPU Driver - This section captures any GPU driver libraries present in the container
  • MPI Library - This section captures any MPI related libraries present in the container
  • BLAS Library - This section captures any BLAS libraries present in the container

Each of the identified issues is classified based on its severity level. The possible severity levels are:

  • ERROR - This means that the identified issue could prevent the container from executing
  • WARNING - This means that the identified issue could potentially cause the container application to not work as intended and/or affect its performance.
  • INFO - This displays information about selected files inside the container (This is only displayed if verbose >= 1)
  • PASS - This means that the particular library (or filesystem object) has been deemed to be able to be used as intended in the container (This is only displayed if verbose >= 2)

Potential Issues

Singularity can be useful if there are system-level dependencies that can't otherwise be installed. However, note that a Singularity container isn't actually enough to guarantee "reproducibility" – just because all the software versions are the same doesn't necessarily mean you'll get the same results. The applications are still dependent on a number of things from the host – for example, any host libraries that are brought into the container on launch (e.g. the GPU driver libraries), the host's kernel, and even the hardware itself. All of these can change the behaviour of, and results generated by, a container – even if the image is identical.

Similarly, the use of containers can be very fragile as soon as you want to actually use the "high performance" features of the cluster – for example, the high-performance Infiniband interconnect (for inter-node communication), or the new instruction sets available on the CPUs (where most of Gadi's raw performance comes from).

Performance impacts

In general we would recommend the use of newer versions of softwares (any tools or even newer OS versions) wherever possible as they usually have significant performance (and security) improvements.

On modern Intel processors, the majority of the performance comes from the additional instruction sets that only get used if you explicitly ask for them at compile time. The default architecture options for both GCC and the Intel compilers is to only use SSE2, and this is what is used for OS-provided packages (i.e. if you just apt-get / yum install a package). This generates code that will work on any machine made in the last 20 years. However, on Gadi almost all of the performance comes from the very wide vector instructions that are only available in much new instruction sets – using just SSE2 means a theoretical maximum of about 1/8th of what the system is capable of. If you want to use the newer instruction sets, additional compiler flags need to be added (e.g. -xCASCADELAKE for the Intel compilers). The impact of this depends on each application and could result in a massive difference in performance.

NCI Filesystem Mounts

When launching a container on Gadi, the following file systems will be automatically mounted along with the user's home directory.

  • /apps
  • /g
  • /opt/nci
  • /scratch
  • /jobfs

This means that these NCI provided filesystem mounts will override anything in these locations if they're also present inside the image. Care must be taken to avoid these mount points. If needed, use the --bind option to bind such locations to another path.

MPI Library

The canonical way for processes to communicate in a HPC environment is via MPI. However, in order to get good performance from an MPI library it needs to know how to use the high-performance interconnect – something that the MPI libraries that are provided in pre-built containers generally lack. This includes things like the use of versions of MPI libraries that do not correctly support the interconnect, often dropping back to inefficient communication techniques so they appear to work, but provide much worse performance than they otherwise could. This is especially relevant on Gadi with the new HDR interconnect which is not widely supported across older libraries. Besides, There's no common MPI library ABI, and so things compiled with, for example, Intel MPI, won't run with OpenMPI

The recommended way is to bind in the host MPI libraries and use them to run the application (this happens automatically on Gadi since /apps is always bound into containers) to get good performance since these are already optimized for Gadi. However, the MPI application would need to be built with a matching MPI version. (Note that you may also need to bind in other dependencies from the host if they're not present in the container image).

BLAS Library

Newer versions of BLAS libraries such as MKL are recommended to enable the use of the extended instruction set of Gadi's CPUs.

GPU Driver

By default, the necessary GPU driver libraries are bound in to the containers when a container is launched on Gadi. This includes the necessary CUDA driver libraries which match the version of the loaded GPU driver. If another version of CUDA driver libraries are present in the container and used, this may result in unexpected behaviour and should be avoided.

Examples

$ singularity-check-container -f alpine.sif   
Checking singularity container: alpine.sif
Extracting squashFS image
Checking container filesystem for potential issues

No issues detected

Please see https://opus.nci.org.au/x/YgDkBw for more information


$ singularity-check-container -f alpine.sif -v
Checking singularity container: alpine.sif
Extracting squashFS image
Checking container filesystem for potential issues

[INFO   ] [OS] Alpine Linux 3.8.1

No issues detected

Please see https://opus.nci.org.au/x/YgDkBw for more information


$ singularity-check-container -f ubuntu_20.04_test.sif -vv
Checking singularity container: ubuntu_20.04_test.sif
Extracting squashFS image
Checking container filesystem for potential issues

[WARNING] [OS] Ubuntu 14.04
[WARNING] [FS Mount] /g/*
[WARNING] [FS Mount] /jobfs/*
[WARNING] [MPI Library] /opt/mpi/intel-mpi/lib/libmpi.so -> /opt/mpi/intel-mpi/lib64/libmpi.so.12.0.0 (IntelMPI: 2021.2)
[WARNING] [FS Mount] /opt/nci/*
[WARNING] [FS Mount] /scratch/*
[WARNING] [GPU Driver] /usr/lib/x86_64-linux-gnu/libcuda.so -> /usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
[PASS   ] [GPU Driver] /usr/lib/x86_64-linux-gnu/libcudart.so -> /usr/lib/x86_64-linux-gnu/libcudart.so.10.1.243
[PASS   ] [BLAS Library] /usr/lib/x86_64-linux-gnu/libmkl_core.so (Version: 20191122)
[WARNING] [MPI Library] /usr/lib/x86_64-linux-gnu/libmpi.so -> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 (OpenMPI: v4.0.3)

Please see https://opus.nci.org.au/x/YgDkBw for more information
  • No labels