Queue Limits...

Queue Limits

The current default walltime and resource limits for Gadi jobs are summarised below. If a higher limit on core count (PBS_NCPUS) and walltime is needed, please launch a ticket on NCI help desk with a short description of the reasons why the exception is requested.

For example, a current scalability study suggests linear speedup at the core count beyond the current PBS_NCPUS limit. We will endeavour to help on a case-by-case basis.

Queue	Max queueing jobs per project	Charge rate per resource *hour ‡	PBS_NCPUS	Max PBS_MEM /node †	Max PBS_JOBFS /node †	Default walltime limit
normal(route§)	1000	2 SU	1-48 multiple of 48	190 GB	400 GB	48 hours for 1-672 cores 24 hours for 720-1440 cores 10 hours for 1488-2976 cores 5 hours for 3024-20736 cores
normal-exec§	300	2 SU	1-48 multiple of 48	190 GB	400 GB
express(route)	1000	6 SU	1-48 multiple of 48	190 GB	400 GB	24 hours for 1-480 cores 5 hours for 528- 3168 cores
express-exec	50	6 SU	1-48 multiple of 48	190 GB	400 GB	24 hours for 1-480 cores 5 hours for 528- 3168 cores
hugemem(route)	1000	3 SU	1-48 multiple of 48	1470 GB	1400 GB	48 hours for 1-48 cores 24 hours for 96 cores 5 hours for 144, 192 cores
hugemem-exec	50	3 SU	1-48 multiple of 48	1470 GB	1400 GB
megamem(route)	300	5 SU	1-48 multiple of 48	2990 GB	1400 GB	48 hours for 1-48 cores 24 hours for 96 cores
megamem-exec	50	5 SU	1-48 multiple of 48	2990 GB	1400 GB	48 hours for 1-48 cores 24 hours for 96 cores
copyq(route)	1000	2 SU	1	190 GB	400 GB	10 hours
copyq-exec	50	2 SU	1	190 GB	400 GB	10 hours
gpuvolta(route)	1000	3 SU	multiple of 12	382 GB	400 GB	48 hours for 1-96 CPU cores 24 hours for 144-192 CPU cores 5 hours for 240-960 CPU cores
gpuvolta-exec	50	3 SU	multiple of 12	382 GB	400 GB

Queue	Max queueing jobs per project	Charge rate per resource *hour ‡	PBS_NCPUS	Max PBS_MEM /node †	Max PBS_JOBFS /node †	Default walltime limit
normalsr(route§)	1000	2 SU	1-104 multiple of 104	500 GB	400 GB	48 hours for less than 1040 cores 24 hours for 1040-2080 cores 10 hours for 2184-4160 cores 5 hours for 4264-10400 cores
normalsr-exec§	300	2 SU	1-104 multiple of 104	500 GB	400 GB
expresssr(route)	1000	6 SU	1-104 multiple of 104	500 GB	400 GB	24 hours for 1-1040 cores 5 hours for 1144- 2080 cores
expresssr-exec	50	6 SU	1-104 multiple of 104	500 GB	400 GB	24 hours for 1-1040 cores 5 hours for 1144- 2080 cores

Queue	Max queueing jobs per project	Charge rate per resource *hour ‡	PBS_NCPUS	Max PBS_MEM /node †	Max PBS_JOBFS /node †	Default walltime limit
normalbw(route)	1000	1.25 SU	1-28 multiple of 28	128 GB, 256 GB	400 GB	48 hours for 1-336 cores 24 hours for 364-840 cores 10 hours for 868-1736 cores 5 hours for 1764- 10080 cores
normalbw-exec	300	1.25 SU	1-28 multiple of 28	128 GB, 256 GB	400 GB
expressbw(route)	1000	3.75 SU	1-28 multiple of 28	128 GB, 256 GB	400 GB	24 hours for 1-280 cores 5 hours for 308-1848 cores
expressbw-exec	50	3.75 SU	1-28 multiple of 28	128 GB, 256 GB	400 GB	24 hours for 1-280 cores 5 hours for 308-1848 cores
hugemembw(route)	500	1.25 SU	7, 14, 21, 28 multiple of 28	1020 GB	390 GB	48 hours for 1-28 cores 12 hours for 56-140 cores
hugemembw-exec	100	1.25 SU	7, 14, 21, 28 multiple of 28	1020 GB	390 GB	48 hours for 1-28 cores 12 hours for 56-140 cores
megamembw(route)	300	1.25 SU	32, 64	3000 GB	800 GB	48 hours for 32 cores 12 hours for 64 cores
megamembw-exec	50	1.25 SU	32, 64	3000 GB	800 GB	48 hours for 32 cores 12 hours for 64 cores

Queue

Max queueing jobs per project

Charge rate per resource *hour ‡

PBS_NCPUS

Max PBS_MEM /node †

Max PBS_JOBFS /node †

Default walltime limit

normalsl(route)

1000

1.5 SU

1-32

multiple of 32

192 GB

400 GB

48 hours for 1-288 cores

24 hours for 320-608 cores

10 hours for 640-1984 cores

5 hours for 2016-3200 cores

normalsl-exec

300

Queue

Max queueing jobs per project

Charge rate per resource *hour ‡

PBS_NCPUS

Max PBS_MEM /node †

Max PBS_JOBFS /node †

Default walltime limit

dgxa100(route)

50

4.5 SU

multiple of 16

2000 GB

28 TB

48 hours for 16-128 cores

5 hours for 144-256 cores

dgxa100-exec

50

† To make sure your jobs can be handled properly when they are terminated because of usage exceeding the memory and/or the local disk limit, please request no more than the amount listed in the corresponding column.

‡ The number of `resource` is calculated as ncpus_request * max[ 1, (ncpus_per_node/mem_per_node)*(mem_request/ncpus_request)]. See Job Costs for more information.

§ The route queue is where jobs stay before they go to the execution queue. Only jobs in the execution queues, whose names end with `-exec`, are considered to be run on compute and data mover nodes by PBS.

Authors: Yue Sun, Andrew Wellington, Anish Varghese

Page tree

Queue Limits...

Authors: Yue Sun, Andrew Wellington, Anish Varghese