Page History

Gadi keeps nodes that have different hardware in different queues. Users submit jobs to a specific queue to run jobs on the corresponding type of node(s). For example, jobs need to run on GPUs have to be submitted to gpuvolta queue to get access to nodes with GPUs.

Gadi queue structure also has two main levels of priority, express and normal, which is reflected in the queue names. Job queues with name of no string `express` have normal priorities. They are designed to support the mainstream work, therefore, with higher core count and longer walltime limits comparing to the corresponding queue with the express priority and the same hardware. Queues with express priority, currently only `express` and `expressbw` queuesExpress queues (express and expressbw), are designed to support work needs rapid turnaround thus can request only less processors within a shorter walltime limit.

Gadi Job Queues

, but at a higher service unit charge.

Intel Xeon Cascade Lake

express

Express priority queue for testing, debugging or other jobs need quick turnaround
2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake Platinum 8274) 3.2 GHz 2 GHz CPUs per node
192GB RAM per node
2 CPU sockets per node, each with 2 NUMA nodes
- 400 GB local SSD disk per node
- Max request of 3168 3200 CPU cores for 5 hours.
normal
- Normal priority queue for standard computational intensive jobs
- 2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake Platinum 8274) 3.22 GHz CPUs per node
- 192GB RAM GHz CPUs per node
- 2 CPU sockets per node, each with 2 NUMA nodes
  - 400 GB local SSD disk per node
  - Max request of 20736 CPU cores for 5 hours, exceptions available on request
  ...
  - Normal priority queue for data archive/transfer and other jobs that need network access., 6 nodes total
  - 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake Platinum 8268 2) 2.9 GHz CPUs per node
  - 192GB RAM per node, 6 nodes in total
  - 2 CPU sockets per node, each with 2 NUMA nodes nodes
    - 800 GB local SSD disk per node
    - External network access (not available on any other nodes in any other queues)
    - Access to the tape filesystem massdata (not available on any other nodes in any other queues, job needs to explicitly flag PBS with the directive `-lstoragel storage=mdssmassdata/<project_code>` to to ensure the access)
    - Max request of 1 CPU cores for 10 hours, exceptions available on requestcore
    hugemem
    - Normal priority queue for jobs that use large amount of RAM, 50 nodes total
    - 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake Platinum 8268 2) 2.9 GHz CPUs per node, 50 nodes in total
    - 1.5 TB of Intel Optane DC Persistent Memory with 384 GB RAM DRAM as the a cache per node
    - 2 CPU sockets per node, each with 2 NUMA nodes nodes
      - 1.5 TB local SSD disk per node
      - Max request of 192 CPU cores for 5 hours, exceptions available on request
      gpuvolta
      - Normal priority queue, nodes equipped with NVIDIA Volta GPUs, 160 nodes total
      - 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake Platinum 8268 2) 2.9 GHz CPUs per node
      - 384 GB RAM per node, 160 nodes in total
      - 2 CPU sockets per node, each with 2 NUMA nodes nodes
        Each NUMA node has
        12 CPU cores
        and
        per NUMA node
        96 GB local RAM per NUMA node
      - 4 x Nvidia Tesla Volta V100-SXM2-32GB per node
      - ? PCIe link bandwidth to quantify the data transfer between GPU and CPU?
      - 480 GB local SSD disk per node
      - Max request of 960 CPU cores (80 GPUs) for 5 hours
      Intel Xeon Broadwell (ex-Raijin)
      ...
      - Express priority queue for testing, debugging or other jobs need quick turnaround on the Broadwell nodes
      - 2 x 14-core core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
      - 128 or 256GB RAM per node
      - 2 CPU sockets per 2 NUMA node, each with 1 socket, NUMA node
        14 CPU cores
        and 64/
        per numa NODE
        64 or 128 GB local
        RAM
        RAM per NUMA node
      - 400GB local SSD disk
      - Max request of 1848 CPU cores for 5 hours
      normalbw
      - Normal priority queue for standard computational intensive jobs on the Broadwell nodes
      - 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
      - 128 or 256GB RAM per node
      - 2 NUMA CPU sockets per node, each with 1 socket, NUMA node
        14 CPU cores
        and 64/
        per numa NODE
        64 or 128 GB local
        RAM
        RAM per NUMA node
      - 400GB local SSD disk
      - Max request of 10080 CPU cores for 5 hours, exceptions available on request
      hugemembw
      - Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 10 nodes total
      - 2 x 14-core core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node, 10 nodes in totalper node
      - 1TB RAM per node
      - 2 CPU sockets per 2 NUMA node, each with 1 socket, NUMA node
        14 CPU cores
        and
        per numa NODE
        512 GB local
        RAM
        RAM per NUMA node
      - 400GB local SSD disk
      - Minimum memory request is 7 cores and 256 GB memory
      - Max request of 140 CPU cores for 12 hours
      megamembw
      - Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 3 nodes total
      - 4 x 8-core Intel core Intel Xeon E7-4809v4 (Broadwell) 2 2.1 GHz CPUs per node
      - 3TB RAM per node, 3 nodes in total.
      - 4 NUMA CPU sockets per node, each with 1 socket, NUMA node
        8 CPU cores
        and
        per numa NODE
        768 GB local
        RAM
        RAM per NUMA node
      - 800GB local SSD disk
      - Minimum memory request is 32 cores and 1.5TB.
      - Max request of 140 32 CPU cores for 12 hours
      Intel Xeon Skylake (ex-Raijin)
      ...
      - Normal priority queue for standard computational intensive jobs on the Skylake nodes, 192 nodes in total
      - 2 x 16-core Intel Xeon Gold 6130 (Skylake) 2.1GHz CPUs per node, 192 nodes in total
      - 192GB RAM per node
      - 2 NUMA CPU sockets per node, each with 1 socket, 14 NUMA node
        16 CPU cores
        and
        per numa NODE
        96 GB local
        RAM
        RAM per NUMA node
      - 400 GB local SSD disk
      - Max request of 64 640 CPU cores for 5 hours, exceptions available on request

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Gadi Job Queues

Intel Xeon Cascade Lake

Intel Xeon Broadwell (ex-Raijin)

Intel Xeon Skylake (ex-Raijin)