Page History

...

Intel Xeon Sapphire Rapids

Hardware Specifications

NCI has installed an expansion to the compute capacity for Gadi containing the latest generation Intel Sapphire Rapids processors. The expansion consists of 720 nodes, each containing two Intel Xeon Platinum 8470Q (Sapphire Rapids) processors with a base frequency of 2.1GHz, with turbo up to 3.8GHz, 512GiB of RAM, and 400GiB of SSD available to jobs for jobfs.

The specifications of the queues for these nodes (a total of 74,880 additional cores) are:

normalsr

Normal priority queue for standard computational intensive jobs
2 x 52-core Intel Xeon Platinum 8470Q (Sapphire Rapids) 2.1 GHz CPUs per node
512GiB RAM per node
2 CPU sockets per node, each with 4 NUMA nodes
- 13 CPU cores per NUMA node
- 64 GB local RAM per NUMA node
400 GB local SSD disk per node
Max request of 10400 CPU cores, exceptions available on request
SU charge rate 2.0 SU per CPU core hour, 208 SU per node hour

...

Express priority queue for testing, debugging or other jobs need quick turnaround
2 x 52-core Intel Xeon Platinum 8470Q (Sapphire Rapids) 2.1 GHz CPUs per node
512GiB RAM per node
2 CPU sockets per node, each with 4 NUMA nodes
- 13 CPU cores per NUMA node
- 64 GB local RAM per NUMA node
400 GB local SSD disk per node
Max request of 10400 CPU cores, exceptions available on request
SU charge rate 6.0 SU per CPU core hour, 624 SU per node hour

Maximum turbo frequency is not always achievable. Please see http://www.intel.com/technology/turboboost/ for further information.

Building Applications

To generate a binary designed to operate on these nodes, you may need to recompile your application specifically targeting these nodes. We recommend using the latest Intel LLVM Compiler for these nodes (currently 2023.0.0, check for updated versions installed on Gadi with module avail intel-compiler-llvm ) with options to build your code for use on all nodes in Gadi.

A build with runtime dispatch for all architectures on Gadi can be build with:

Code Block
module load intel-compiler-llvm/<version> icx -O3 -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS myCode.c -o myBinary

We always recommend using the latest version of the Intel compilers, as older versions may not be able to optimise for newer architectures.

There may be other options that will assist some codes. For example testing with -qopt-zmm-usage=high (the default is low, i.e. prefer 256-bit wide instructions instead of 512-bit). Some code that is especially floating-point heavy may benefit from this flag, but you should test with and without this flag to see if your code benefits as it will cause some code to slowdown.

Running Jobs

To submit jobs to these nodes you should select the appropriate queue for your job.

Queue	Memory Available	Priority	Charge Rate
normalsr	512GiB	Regular	2.0 SU / (resource*hour)
expresssr	512GiB	High	6.0 SU / (resource*hour)

You should specify the queue you wish to use via the -q option to qsub, or with a #PBS -q directive in your PBS job script.

As with the normal and express queues, any job larger than one node must request CPUs in multiples of full nodes. This means that you should consider the number of CPUs required for your job to ensure you are requesting an appropriate amount for your job. In particular where you may request 48 CPU cores in a normal queue job you should now look at requesting 104 CPU cores.

Page tree

Versions Compared

Old Version 11

New Version Current

Key

Intel Xeon Sapphire Rapids

Hardware Specifications

Building Applications

Running Jobs