Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Gadi keeps nodes that have different hardware in different queues. Users submit jobs to a specific queue to run jobs on the corresponding type of node(s). For example, jobs need to run on GPUs have to be submitted to gpuvolta queue to get access to nodes with GPUs. 

Gadi queue structure also has two main levels of priority, express and normal, which is reflected in the queue names.  Job queues with name of no string `express` have normal priorities. They are designed to support the mainstream work, therefore, with higher core count and longer walltime limits comparing to the corresponding queue with the express priority and the same hardware. Queues with express priority, currently only `express` and `expressbw` queuesExpress queues (express and expressbw), are designed to support work needs rapid turnaround thus can request only less processors within a shorter walltime limit. 

Gadi Job Queues

, but at a higher service unit charge.

Intel Xeon Cascade Lake

express

  • Express priority queue for testing, debugging or other jobs need quick turnaround
  • 2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake Platinum 8274) 3.2 GHz 2 GHz CPUs per node
  • 192GB RAM per node
  • 2 CPU sockets per node, each with 2 NUMA nodes 
      Each NUMA node has
      • 12 CPU cores
      and
      • per NUMA node
      • 48 GB local RAM per NUMA node
    • 400 GB local SSD disk per node
    • Max request of 3168 3200 CPU cores for 5 hours.

    normal

    • Normal priority queue for standard computational intensive jobs
    • 2 x 24-core Intel Xeon Platinum 8274 (Cascade Lake Platinum 8274) 3.22 GHz CPUs per node
    • 192GB RAM  GHz CPUs per node
    • 2 CPU sockets per node, each with 2 NUMA nodes 
        Each NUMA node has
        • 12 CPU cores
        and
        • per NUMA node
        • 48 GB local RAM per NUMA node
      • 400 GB local SSD disk per node
      • Max request of 20736 CPU cores for 5 hours, exceptions available on request

      ...

      • Normal priority queue for data archive/transfer and other jobs that need network access., 6 nodes total
      • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake Platinum 8268 2) 2.9 GHz CPUs per node
      • 192GB RAM per node, 6 nodes in total
      • 2 CPU sockets per node, each with 2 NUMA nodes nodes
          Each NUMA node has
          • 12 CPU cores
          and
          • per NUMA node
          • 48 GB local RAM per NUMA node
        • 800 GB local SSD disk per node
        • External network access (not available on any other nodes in any other queues)
        • Access to the tape filesystem massdata (not available on any other nodes in any other queues, job needs to explicitly flag PBS with the directive `-lstoragel storage=mdssmassdata/<project_code>` to  to ensure the access)
        • Max request of 1 CPU cores for 10 hours, exceptions available on requestcore

        hugemem

        • Normal priority queue for jobs that use large amount of RAM, 50 nodes total
        • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake Platinum 8268 2) 2.9 GHz CPUs per node50 nodes in total
        • 1.5 TB of Intel Optane DC Persistent Memory with 384 GB RAM DRAM as the a cache per node
        • 2 CPU sockets per node, each with 2 NUMA nodes nodes
            Each NUMA node has
            • 12 CPU cores
            and 384 GB Optane DC memory (each with 96 GB RAM as the cache)
            • per NUMA node
            • 384GB Optane DC Persistent Memory per NUMA node
            • 96GB RAM per NUMA node as cache
          • 1.5 TB local SSD disk per node
          • Max request of 192 CPU cores for 5 hours, exceptions available on request

          gpuvolta

          • Normal priority queue, nodes equipped with NVIDIA Volta GPUs, 160 nodes total
          • 2 x 24-core Intel Xeon Platinum 8268 (Cascade Lake Platinum 8268 2) 2.9 GHz CPUs per node
          • 384 GB RAM per node, 160 nodes in total
          • 2 CPU sockets per node, each with 2 NUMA nodes nodes
            Each NUMA node has
            • 12 CPU cores
            and
            • per NUMA node
            • 96 GB local RAM per NUMA node
          • 4 x Nvidia Tesla Volta V100-SXM2-32GB per node
          • ? PCIe link bandwidth to quantify the data transfer between GPU and CPU?
          • 480 GB local SSD disk per node 
          • Max request of 960 CPU cores (80 GPUs) for 5 hours

          Intel Xeon Broadwell (ex-Raijin)

          ...

          • Express priority queue for testing, debugging or other jobs need quick turnaround on the Broadwell nodes
          • 2 x 14-core core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
          • 128 or 256GB RAM per node
          • 2 CPU sockets per 2 NUMA node, each with 1 socket, NUMA node
            • 14 CPU cores
            and 64/
            • per numa NODE
            • 64 or 128 GB local
            RAM
            • RAM per NUMA node
          • 400GB local SSD disk
          • Max request of 1848 CPU cores for 5 hours

          normalbw

          • Normal priority queue for standard computational intensive jobs on the Broadwell nodes
          • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
          • 128 or 256GB RAM per node
          • 2 NUMA CPU sockets per node, each with 1 socket, NUMA node
            • 14 CPU cores
            and 64/
            • per numa NODE
            • 64 or 128 GB local
            RAM
            • RAM per NUMA node
          • 400GB local SSD disk
          • Max request of 10080 CPU cores for 5 hours, exceptions available on request

          hugemembw

          • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 10 nodes total
          • 2 x 14-core core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs  per node, 10 nodes in totalper node
          • 1TB RAM per node
          • 2 CPU sockets per 2 NUMA node, each with 1 socket, NUMA node
            • 14 CPU cores
            and
            • per numa NODE
            • 512 GB local
            RAM
            • RAM per NUMA node
          • 400GB local SSD disk
          • Minimum memory request is 7 cores and 256 GB memory
          • Max request of 140 CPU cores for 12 hours

          megamembw

          • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes, 3 nodes total
          • 4 x 8-core Intel core Intel Xeon E7-4809v4 (Broadwell) 2 2.1 GHz CPUs per node
          • 3TB RAM per node, 3 nodes in total.
          • 4 NUMA CPU sockets per node, each with 1 socket, NUMA node
            • 8 CPU cores
            and
            • per numa NODE
            • 768 GB local
            RAM
            • RAM per NUMA node
          • 800GB local SSD disk
          • Minimum memory request is 32 cores and 1.5TB.
          • Max request of 140 32 CPU cores for 12 hours

          Intel Xeon Skylake (ex-Raijin)

          ...

          • Normal priority queue for standard computational intensive jobs on the Skylake nodes, 192 nodes in total
          • 2 x 16-core Intel Xeon Gold 6130 (Skylake) 2.1GHz CPUs per node, 192 nodes in total
          • 192GB RAM per node
          • 2 NUMA CPU sockets per node, each with 1 socket, 14 NUMA node
            • 16 CPU cores
            and
            • per numa NODE
            • 96 GB local
            RAM
            • RAM per NUMA node
          • 400 GB local SSD disk
          • Max request of 64 640 CPU cores for 5 hours, exceptions available on request