Page tree

On this page

Overview

On Gadi, users should submit jobs to a specific queue to run jobs on the corresponding type of node. For example, jobs need to run on GPUs have to be submitted to gpuvolta queue to get access to nodes with GPUs, while jobs requiring large amounts of memory may use the hugemem queue. If your job can run on the nodes in one of the normal queues, you should use those queues. The normal queues have more nodes available for your jobs, and this allows users and jobs that do require more specialised queues to get fair access to those queues.

Gadi queue structure also has two main levels of priority, express and normal, which is reflected in the queue names. Express queues (express and expressbw), are designed to support work needs rapid turnaround, but at a higher service unit charge.

Intel Xeon Cascade Lake

express

  • Express priority queue for testing, debugging or other jobs need quick turnaround
  • 2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake) 3.2 GHz CPUs per node
  • 192GB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes 
    • 12 CPU cores per NUMA node
    • 48 GB local RAM per NUMA node
  • 400 GB local SSD disk per node
  • Max request of 3200 CPU cores

normal

  • Normal priority queue for standard computational intensive jobs
  • 2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake) 3.2 GHz CPUs per node
  • 192GB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes 
    • 12 CPU cores per NUMA node
    • 48 GB local RAM per NUMA node
  • 400 GB local SSD disk per node
  • Max request of 20736 CPU cores, exceptions available on request

copyq

  • Normal priority queue for data archive/transfer and other jobs that need network access, 6 nodes total
  • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake) 2.9 GHz CPUs per node
  • 192GB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 48 GB local RAM per NUMA node
  • 800 GB local SSD disk per node
  • External network access (not available on any other nodes in any other queues)
  • Access to the tape filesystem massdata (not available on any other nodes in any other queues, job needs to explicitly flag PBS with the directive -l storage=massdata/<project_code> to ensure the access)
  • Max request of 1 CPU core

hugemem

  • Normal priority queue for jobs that use large amount of RAM, 50 nodes total
  • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake) 2.9 GHz CPUs per node
  • 1.5 TB of Intel Optane DC Persistent Memory with 384 GB DRAM as a cache per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 384GB Optane DC Persistent Memory per NUMA node
    • 96GB RAM per NUMA node as cache
  • 1.5 TB local SSD disk per node
  • Max request of 192 CPU cores, exceptions available on request

megamem

  • Normal priority queue for jobs that use very large amount of RAM, 4 nodes total
  • 2 x 24-core Intel Xeon Platinum 8260L (Cascade Lake) 2.4 GHz CPUs per node
  • 3.0 TB of Intel Optane DC Persistent Memory with 384 GB DRAM as a cache per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 768GB Optane DC Persistent Memory per NUMA node
    • 96GB RAM per NUMA node as cache
  • 1.5 TB local SSD disk per node
  • Max request of 96 CPU cores, exceptions available on request

gpuvolta

  • Normal priority queue, nodes equipped with NVIDIA Volta GPUs, 160 nodes total
  • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake) 2.9 GHz CPUs per node
  • 384 GB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 12 CPU cores per NUMA node
    • 96 GB local RAM per NUMA node
  • 4 x Nvidia Tesla Volta V100-SXM2-32GB per node
  • 480 GB local SSD disk per node 
  • Max request of 960 CPU cores (80 GPUs)

Intel Xeon Broadwell (ex-Raijin)

expressbw

  • Express priority queue for testing, debugging or other jobs need quick turnaround on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 128 or 256GB RAM per node
  • 2 CPU sockets per node, each with 1 NUMA node
    • 14 CPU cores per numa NODE
    • 64 or 128 GB local RAM per NUMA node
  • 400GB local SSD disk
  • Max request of 1848 CPU cores

normalbw

  • Normal priority queue for standard computational intensive jobs on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 128 or 256GB RAM per node
  • 2 CPU sockets per node, each with 1 NUMA node
    • 14 CPU cores per numa NODE
    • 64 or 128 GB local RAM per NUMA node
  • 400GB local SSD disk
  • Max request of 10080 CPU cores, exceptions available on request

hugemembw

  • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 10 nodes total
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 1TB RAM per node
  • 2 CPU sockets per node, each with 1 NUMA node
    • 14 CPU cores per numa NODE
    • 512 GB local RAM per NUMA node
  • 400GB local SSD disk
  • Minimum memory request is 7 cores and 256 GB memory
  • Max request of 140 CPU cores

megamembw

  • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 3 nodes total
  • 4 x 8-core Intel Xeon E7-4809v4 (Broadwell) 2.1 GHz CPUs per node
  • 3TB RAM per node
  • 4 CPU sockets per node, each with 1 NUMA node
    • 8 CPU cores per numa NODE
    • 768 GB local RAM per NUMA node
  • 800GB local SSD disk
  • Minimum memory request is 32 cores and 1.5TB
  • Min request of 32 CPU cores

Intel Xeon Skylake (ex-Raijin)

normalsl

  • Normal priority queue for standard computational intensive jobs on the Skylake nodes, 192 nodes in total
  • 2 x 16-core Intel Xeon Gold 6130 (Skylake) 2.1GHz CPUs per node
  • 192GB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes
    • 8 CPU cores per numa NODE
    • 48 GB local RAM per NUMA node
  • 400 GB local SSD disk
  • Max request of 640 CPU cores, exceptions available on request

NVIDIA DGX A100 (Specialised GPU)

dgxa100

  • Normal priority queue for specialised GPU work, 2 nodes in total
  • 2x 64-core AMD EPYC 7742 2.25GHz CPUs per node
  • 2TB RAM per node
  • 2 CPU sockets per node, each with 4 NUMA nodes
    • 16 CPU cores per NUMA node
    • 256 GB local RAM per NUMA node
  • 8x NVIDIA A100-SXM4-80GB per node
  • 27TB local SSD disk
  • Max request of 128 CPU cores, exceptions available on request

Intel Xeon Sapphire Rapids

Hardware Specifications

NCI has installed an expansion to the compute capacity for Gadi containing the latest generation Intel Sapphire Rapids processors. The expansion consists of 720 nodes, each containing two Intel Xeon Platinum 8470Q (Sapphire Rapids) processors with a base frequency of 2.1GHz, with turbo up to 3.8GHz, 512GiB of RAM, and 400GiB of SSD available to jobs for jobfs.

The specifications of the queues for these nodes (a total of 74,880 additional cores) are:

normalsr

  • Normal priority queue for standard computational intensive jobs
  • 2 x 52-core Intel Xeon Platinum 8470Q (Sapphire Rapids) 2.1 GHz CPUs per node
  • 512GiB RAM per node
  • 2 CPU sockets per node, each with 4 NUMA nodes
    • 13 CPU cores per NUMA node
    • 64 GB local RAM per NUMA node
  • 400 GB local SSD disk per node
  • Max request of 10400 CPU cores, exceptions available on request
  • SU charge rate 2.0 SU per CPU core hour, 208 SU per node hour

expresssr

  • Express priority queue for testing, debugging or other jobs need quick turnaround
  • 2 x 52-core Intel Xeon Platinum 8470Q (Sapphire Rapids) 2.1 GHz CPUs per node
  • 512GiB RAM per node
  • 2 CPU sockets per node, each with 4 NUMA nodes
    • 13 CPU cores per NUMA node
    • 64 GB local RAM per NUMA node
  • 400 GB local SSD disk per node
  • Max request of 10400 CPU cores, exceptions available on request
  • SU charge rate 6.0 SU per CPU core hour, 624 SU per node hour

Maximum turbo frequency is not always achievable. Please see http://www.intel.com/technology/turboboost/ for further information.

Building Applications

To generate a binary designed to operate on these nodes, you may need to recompile your application specifically targeting these nodes. We recommend using the latest Intel LLVM Compiler for these nodes (currently 2023.0.0, check for updated versions installed on Gadi with module avail intel-compiler-llvm )  with options to build your code for use on all nodes in Gadi.

A build with runtime dispatch for all architectures on Gadi can be build with:

module load intel-compiler-llvm/<version>
icx -O3 -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS myCode.c -o myBinary

We always recommend using the latest version of the Intel compilers, as older versions may not be able to optimise for newer architectures.

There may be other options that will assist some codes. For example testing with -qopt-zmm-usage=high (the default is low, i.e. prefer 256-bit wide instructions instead of 512-bit). Some code that is especially floating-point heavy may benefit from this flag, but you should test with and without this flag to see if your code benefits as it will cause some code to slowdown.

Running Jobs

To submit jobs to these nodes you should select the appropriate queue for your job.

QueueMemory AvailablePriorityCharge Rate
normalsr512GiBRegular2.0 SU / (resource*hour)
expresssr512GiBHigh6.0 SU / (resource*hour)

You should specify the queue you wish to use via the -q option to qsub, or with a #PBS -q directive in your PBS job script.

As with the normal and express queues, any job larger than one node must request CPUs in multiples of full nodes. This means that you should consider the number of CPUs required for your job to ensure you are requesting an appropriate amount for your job. In particular where you may request 48 CPU cores in a normal queue job you should now look at requesting 104 CPU cores.