Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Gadi keeps nodes that have different hardware in different queues. Users submit jobs to a specific queue to run jobs on the corresponding type of node(s). For example, jobs need to run on GPUs have to be submitted to gpuvolta queue to get access to GPUs. 

Gadi queue structure also has two main levels of priority, express and normal, which is reflected in the queue names. Job queues with name of no string `express` have normal priorities. They are designed to support the mainstream work, therefore, with higher core count and longer walltime limits comparing to the corresponding queue with the express priority and the same hardware. Queues with express priority, currently only `express` and `expressbw` queues, are designed to support work needs rapid turnaround thus can request only less processors within a shorter walltime limit. 

Gadi Job Queues

Intel Xeon Cascade Lake

express

  • Express priority queue for testing, debugging or other jobs need quick turnaround
  • 2 x 24-core Intel Xeon Cascade Lake Platinum 8274 3.2 GHz CPUs per node
  • 2 sockets per node, each with 2 NUMA nodes 
  • Each NUMA node has 12 CPU cores and 48 GB local RAM
  • 400 GB local SSD disk per node
  • Max request of 3168 CPU cores for 5 hours.

normal

  • Normal priority queue for standard computational intensive jobs
  • 2 x 24-core Intel Xeon Cascade Lake Platinum 8274 3.2 GHz CPUs per node
  • 2 sockets per node, each with 2 NUMA nodes 
  • Each NUMA node has 12 CPU cores and 48 GB local RAM
  • 400 GB local SSD disk per node
  • Max request of 20736 CPU cores for 5 hours, exceptions available on request

copyq

  • Normal priority queue for data archive/transfer and other jobs that need network access.
  • 2 x 24-core Intel Xeon Cascade Lake Platinum 8268 2.9 GHz CPUs per node, 6 nodes in total
  • 2 sockets per node, each with 2 NUMA nodes 
  • Each NUMA node has 12 CPU cores and 48 GB local RAM
  • 800 GB local SSD disk per node
  • External network access (not available on any other nodes in any other queues)
  • Access to the tape filesystem massdata (not available on any other nodes in any other queues, job needs to explicitly flag PBS with the directive `-lstorage=mdss/<project_code>` to ensure the access)
  • Max request of 1 CPU cores for 10 hours, exceptions available on request

hugemem

  • Normal priority queue for jobs that use large amount of RAM
  • 2 x 24-core Intel Xeon Cascade Lake Platinum 8268 2.9 GHz CPUs per node, 50 nodes in total
  • 1.5 TB of Intel Optane DC Persistent Memory with 384 GB RAM as the cache
  • 2 sockets per node, each with 2 NUMA nodes 
  • Each NUMA node has 12 CPU cores and 384 GB Optane DC memory (each with 96 GB RAM as the cache)
  • 1.5 TB local SSD disk per node
  • Max request of 192 CPU cores for 5 hours, exceptions available on request

gpuvolta

  • 2 x 24-core Intel Xeon Cascade Lake Platinum 8268 2.9 GHz CPUs per node, 160 nodes in total
  • 2 sockets per node, each with 2 NUMA nodes 
  • Each NUMA node has 12 CPU cores and 96 GB local RAM
  • 4 x Nvidia Tesla Volta V100-SXM2-32GB per node
  • ? PCIe link bandwidth to quantify the data transfer between GPU and CPU?
  • 480 GB local SSD disk per node 
  • Max request of 960 CPU cores (80 GPUs) for 5 hours

Intel Xeon Broadwell (ex-Raijin)

expressbw

  • Express priority queue for testing, debugging or other jobs need quick turnaround on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 2 NUMA node, each with 1 socket, 14 CPU cores and 64/128 GB local RAM
  • 400GB local SSD disk
  • Max request of 1848 CPU cores for 5 hours

normalbw

  • Normal priority queue for standard computational intensive jobs on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs per node
  • 2 NUMA node, each with 1 socket, 14 CPU cores and 64/128 GB local RAM
  • 400GB local SSD disk
  • Max request of 10080 CPU cores for 5 hours, exceptions available on request

hugemembw

  • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes
  • 2 x 14-core Intel Xeon E5-2690v4 (Broadwell) 2.6GHz CPUs  per node, 10 nodes in total
  • 2 NUMA node, each with 1 socket, 14 CPU cores and 512 GB local RAM
  • 400GB local SSD disk
  • Minimum memory request is 7 cores and 256 GB memory
  • Max request of 140 CPU cores for 12 hours

megamembw

  • Normal priority queue for jobs that use large amount of RAM on the Broadwell nodes
  • 4 x 8-core Intel Xeon E7-4809v4 ( Broadwell) 2.1 GHz CPUs per node, 3 nodes in total.
  • 4 NUMA node, each with 1 socket, 8 CPU cores and 768 GB local RAM
  • 800GB local SSD disk
  • Minimum memory request is 32 cores and 1.5TB.
  • Max request of 140 CPU cores for 12 hours

Intel Xeon Skylake (ex-Raijin)

normalsl

  • Normal priority queue for standard computational intensive jobs on the Skylake nodes
  • 2 x 16-core Intel Xeon Gold 6130 (Skylake) 2.1GHz CPUs per node, 192 nodes in total
  • 2 NUMA node, each with 1 socket, 14 CPU cores and 96 GB local RAM
  • 400 GB local SSD disk
  • Max request of 64 CPU cores for 5 hours, exceptions available on request


  • No labels