Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleQueue structure

To run jobs on Gadi, users should submit to a specific queue on a corresponding node. Which queue and which node you choose to run on will depend on a variety of factors.

For example, any job that needs to run on GPUs will need to be submitted to the gpuvolta or dgxa100 queue, as these are the only queues with access to GPUs.

Any jobs that require a large amount of memory should be submitted to the hugemem queue to take advantage of the persistent memory there.

Note

If your job can run on the nodes in a normal queue, you should use those queues. The normal queues have more nodes available for your jobs, and will allow users, and jobs that require a specialised queue, to get fair access to those resources. 

The queue structure is split into two main levels of priority, express and normal, which correlates directly to the queue names.

Express queues are designs to support work that needs a faster turnaround, but will be charged accordingly at a higher service unit charge. 


Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleIntel Xeon Cascade Lake
Expand
titleexpress
  • Express priority queue for testing, debugging or other jobs need quick turnaround
  • 2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake) 3.2 GHz CPUs per node
  • 192 GiB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes 
    • 12 CPU cores per NUMA node
    • 48 GiB local RAM per NUMA node
  • 400 GiB local SSD disk per node
  • Max request of 3200 CPU cores
Expand
titlenormal
  • Normal priority queue for standard computational intensive jobs
  • 2 x 24-coreIntel Xeon Platinum 8274 (Cascade Lake) 3.2 GHz CPUs per node
  • 192 GiB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes 
    • 12 CPU cores per NUMA node
    • 48 GiB local RAM per NUMA node
  • 400 GiB local SSD disk per node
  • Max request of 20736 CPU cores, exceptions available on request
Expand
titlecopyq
  • Normal priority queue for data archive/transfer and other jobs that need network access, 6 nodes total
  • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake) 2.9 GHz CPUs per node
  • 192 GiB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 48 GiB local RAM per NUMA node
  • 800 GiB local SSD disk per node
  • External network access (not available on any other nodes in any other queues)
  • Access to the tape filesystem massdata (not available on any other nodes in any other queues, job needs to explicitly flag PBS with the directive -l storage=massdata/<project_code> to ensure the access)
  • Max request of 1 CPU core
Expand
titlehugemem
  • Normal priority queue for jobs that use large amount of RAM, 50 nodes total
  • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake) 2.9 GHz CPUs per node
  • 1.5 TiB of Intel Optane DC Persistent Memory with 384 GiB DRAM as a cache per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 384GiB Optane DC Persistent Memory per NUMA node
    • 96GiB RAM per NUMA node as cache
  • 1.5 TiB local SSD disk per node
  • Max request of 192 CPU cores, exceptions available on request
Expand
titlemegamem
  • Normal priority queue for jobs that use very large amount of RAM, 4 nodes total
  • 2 x 24-core Intel Xeon Platinum 8260L (Cascade Lake) 2.4 GHz CPUs per node
  • 3.0 TB of Intel Optane DC Persistent Memory with 384 GiB DRAM as a cache per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 768 GiB Optane DC Persistent Memory per NUMA node
    • 96 GiB RAM per NUMA node as cache
  • 1.5 TiB local SSD disk per node
  • Max request of 96 CPU cores, exceptions available on request
Expand
titlegpuvolta
  • Normal priority queue, nodes equipped with NVIDIA Volta GPUs, 160 nodes total
  • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake) 2.9 GHz CPUs per node
  • 384 GiB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 96 GiB local RAM per NUMA node
  • 4 x Nvidia Tesla Volta V100-SXM2-32GB per node
  • 480 GiB local SSD disk per node 
  • Max request of 960 CPU cores (80 GPUs)
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleIntel Xeon Sapphire Rapids
Expand
titlenormalsr
  • Normal priority queue for standard computational intensive jobs
  • 2 x 52-core Intel Xeon Platinum 8470Q (Sapphire Rapids) 2.1 GHz CPUs per node
  • 512GiB RAM per node
  • 2 CPU sockets per node, each with 4 NUMA nodes
    • 13 CPU cores per NUMA node
    • 64 GiB local RAM per NUMA node
  • 400 GiB local SSD disk per node
  • Max request of 10400 CPU cores, exceptions available on request
  • SU charge rate 2.0 SU per CPU core hour, 208 SU per node hour
Expand
titleexpresssr
  • Express priority queue for testing, debugging or other jobs need quick turnaround
  • 2 x 52-core Intel Xeon Platinum 8470Q (Sapphire Rapids) 2.1 GHz CPUs per node
  • 512GiB RAM per node
  • 2 CPU sockets per node, each with 4 NUMA nodes
    • 13 CPU cores per NUMA node
    • 64 GiB local RAM per NUMA node
  • 400 GiB local SSD disk per node
  • Max request of 10400 CPU cores, exceptions available on request
  • SU charge rate 6.0 SU per CPU core hour, 624 SU per node hour
NCI has installed an expansion to the compute capacity for Gadi containing the latest generation Intel Sapphire Rapids processors. The expansion consists of 720 nodes, each containing two Intel Xeon Platinum 8470Q (Sapphire Rapids) processors with a base frequency of 2.1GHz, with turbo up to 3.8GHz, 512 GiB of RAM, and 400 GiB of SSD available to jobs for jobfs.

Maximum turbo frequency is not always achievable. Please see http://www.intel.com/technology/turboboost/ for further information.

Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleIntel Xeon Broadwell (ex-Raijin)
Expand
titleexpressbw
  • Express priority queue for testing, debugging or other jobs need quick turnaround on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 128 or 256 GiB RAM per node
  • 2 CPU sockets per node, each with 1 NUMA node
    • 14 CPU cores per numa NODE
    • 64 or 128 GiB local RAM per NUMA node
  • 400 GiB local SSD disk
  • Max request of 1848 CPU cores
Expand
titlenormalbw
  • Normal priority queue for standard computational intensive jobs on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 128 or 256 GiB RAM per node
  • 2 CPU sockets per node, each with 1 NUMA node
    • 14 CPU cores per numa NODE
    • 64 or 128 GiB local RAM per NUMA node
  • 400 GiB local SSD disk
  • Max request of 10080 CPU cores, exceptions available on request
Expand
titlehugemebwhugemembw
  • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 10 nodes total
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 1 TiB RAM per node
  • 2 CPU sockets per node, each with 1 NUMA node
    • 14 CPU cores per numa NODE
    • 512 GiB local RAM per NUMA node
  • 400 GiB local SSD disk
  • Minimum memory request is 7 cores and 256 GiB memory
  • Max request of 140 CPU cores
Expand
titlemegamembw
  • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 3 nodes total
  • 4 x 8-core Intel Xeon E7-4809v4 (Broadwell) 2.1 GHz CPUs per node
  • 3 TiB RAM per node
  • 4 CPU sockets per node, each with 1 NUMA node
    • 8 CPU cores per numa NODE
    • 768 GiB local RAM per NUMA node
  • 800 GiB local SSD disk
  • Minimum memory request is 32 cores and 1.5TiB
  • Min request of 32 CPU cores
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleIntel Xeon Skylake (ex-Raijin)
Expand
titlenormalsl
  • Normal priority queue for standard computational intensive jobs on the Skylake nodes, 192 nodes in total
  • 2 x 16-core Intel Xeon Gold 6130 (Skylake) 2.1GHz CPUs per node
  • 192 GiB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 8 CPU cores per numa NODE
    • 48 GiB local RAM per NUMA node
  • 400 GiB local SSD disk
  • Max request of 640 CPU cores, exceptions available on request
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleNVIDIA DGX A100 (Specialised GPU)
Expand
titledgxa100
  • Normal priority queue for specialised GPU work, 2 nodes in total
  • 2x 64-core AMD EPYC 7742 2.25GHz CPUs per node
  • 2 TiB RAM per node
  • 2 CPU sockets per node, each with 4 NUMA nodes
    • 16 CPU cores per NUMA node
    • 256 GiB local RAM per NUMA node
  • 8x NVIDIA A100-SXM4-80GB per node
  • 27 TiB local SSD disk
  • Max request of 128 CPU cores, exceptions available on request
Note
titleRemember

If your job can run on the nodes in a normal queue, you should use those queues.

The normal queues have more nodes available for your jobs, and will allow users and jobs that require a specialised queue to get fair access to those resources. 



Building Applications

To create binary code that will run on these nodes, you may need to recompile your application so that it specifically targets them.

NCI recommends using the latest Intel LLVM compiler for these nodes with options to build your code to use on all of Gadi's nodes. 

To check for the latest version of Intel LLVM you can run the command

Code Block
themeFadeToGrey
$ nci_account -P <project code> -v

NCI recommends using the latest version of these compilers as older versions may not be optimised to work with newer architecture.

Tip

A build with runtime dispatch for all architectures on Gadi can be built with 

Code Block
themeFadeToGrey
module load intel-compiler-llvm/<version>
icx -O3 -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS myCode.c -o myBinary

There may be other options that will assist some codes. For example testing with -qopt-zmm-usage=high (the default is low, i.e. prefer 256-bit wide instructions instead of 512-bit).

Code that is especially floating-point heavy may benefit from this flag, but you should test with and without this flag to see if your code benefits as it will cause some code to slow down.

Authors: Yue Sun, Andrew Wellington, Andrey Bliznyuk, Ben Menadue, Mohsin Ali, Andrew Johnston