To run jobs on Gadi, users should submit to a specific queue on a corresponding node. Which queue and which node you choose to run on will depend on a variety of factors.
For example, any job that needs to run on GPUs will need to be submitted to the gpuvolta
or dgxa100
queue, as these are the only queues with access to GPUs.
Any jobs that require a large amount of memory should be submitted to the hugemem
queue to take advantage of the persistent memory there.
If your job can run on the nodes in a normal queue, you should use those queues. The normal queues have more nodes available for your jobs, and will allow users, and jobs that require a specialised queue, to get fair access to those resources.
The queue structure is split into two main levels of priority, express
and normal
, which correlates directly to the queue names.
Express queues are designs to support work that needs a faster turnaround, but will be charged accordingly at a higher service unit charge.
Maximum turbo frequency is not always achievable. Please see http://www.intel.com/technology/turboboost/ for further information.
Remember
If your job can run on the nodes in a normal queue, you should use those queues.
The normal queues have more nodes available for your jobs, and will allow users and jobs that require a specialised queue to get fair access to those resources.
To create binary code that will run on these nodes, you may need to recompile your application so that it specifically targets them.
NCI recommends using the latest Intel LLVM compiler for these nodes with options to build your code to use on all of Gadi's nodes.
To check for the latest version of Intel LLVM you can run the command
$ nci_account -P <project code> -v
NCI recommends using the latest version of these compilers as older versions may not be optimised to work with newer architecture.
A build with runtime dispatch for all architectures on Gadi can be built with
module load intel-compiler-llvm/<version> icx -O3 -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS myCode.c -o myBinary
There may be other options that will assist some codes. For example testing with -qopt-zmm-usage=high
(the default is low
, i.e. prefer 256-bit wide instructions instead of 512-bit).
Code that is especially floating-point heavy may benefit from this flag, but you should test with and without this flag to see if your code benefits as it will cause some code to slow down.