Page tree
Skip to end of metadata
Go to start of metadata

NCI uses PBSPro for job submission and scheduling.

Quick Syntax guide

qstat

 

Standard queue status command (see man qstat for details of options).
nqstatGives more information than qstat (above), such as the limit on wall clock time and memory for you and your project.

qdel jobid

Delete your unwanted jobs from the queues. The jobid is returned by qsub at job submission time, and is also displayed in the nqstat output.
qsub

Submit jobs to the queues. Simple example jobscripts looks like this:

For interactive jobs: 

$ qsub -P a99 -q normal -l walltime=20:00:00,mem=300MB -l wd
./a.out
^D     [use control-D]

[where P=project code (eg. a99); q=queue name (eg. normal). See qsub options below for more information.]

note that there is are carriage-returns after -l wd and ./a.out


For batch jobs:

$ qsub -P a99 -q normal -l walltime=20:00,mem=300MB -l wd jobscript

where jobscript is an ascii file containing the shell script to run your commands (not the compiled executable, which is a binary file).
More conveniently, the qsub options can be placed within the script to avoid typing them for each job:

#!/bin/bash
#PBS -P a99 
#PBS -q normal 
#PBS -l walltime=20:00:00,mem=300MB 
#PBS -l wd
./a.out

You submit this script for execution by PBS using the command:

$ qsub jobscript

You may need to enter data to the program and may be used to doing this interactively when prompted by the program.

There are two ways of doing this in batch jobs.

If, for example, the program requires the numbers 1000, then 50, to be entered when prompted, you can create a file (eg. called 'input') containing these values:

$ cat input
1000
50

then run the program as

 ./a.out < input

or the data can be included in the batch job script as follows:

#!/bin/bash
#PBS -P a99 
#PBS -q normal 
#PBS -l walltime=20:00:00,mem=300MB 
#PBS -l wd
./a.out << EOF 
1000
50
EOF

Notice that the PBS directives are all at the start of the script, that there are no blank lines between them, and there are no other non-PBS commands until after all the PBS directives.

qsub options of note:

-P projectThe project which you want to charge the jobs resource usage to. The default project is specified by the PROJECT environment variable.
-q queue

Select the queue to run the job in. The queues you can use are listed by running nqstat.

-l walltime=??:??:??The wall clock time limit for the job. Time is expressed in seconds as an integer, or in the form:
[[hours:]minutes:]seconds[.milliseconds]
System scheduling decisions depend heavily on the walltime request – it is always best to make it as accurate as possible.
-l mem=???MB

The total memory limit across all nodes for the job – can be specified with units of “MB” or “GB” but only integer values can be given. There is a small default value.
Your job will only run if there is sufficient free memory so making an accurate memory request will allow your jobs to run sooner.

A little trial and error may be required to find how much memory your jobs are using – nqstat lists jobs' actual usage.

-l ncpus=?

The number of cpus required for the job to run. The default is 1.

 -l ncpus=N - If the number of CPUs requested, N, is small enough the job will run within a single shared memory node.

If the number of CPUs specified is too large, the job will be distributed over multiple nodes. Currently on NCI systems, these larger requests are restricted to multiples of 16 for Sandy Bridge and 28 for Broadwell nodes.

-l jobfs=???GB

The requested job scratch space. This will reserve disk space, making it unavailable for other jobs, so please try not to overestimate your needs.

Any files created in the $PBS_JOBFS directory are automatically removed at the end of the job. Ensure that you use integers, and units of MB or GB (not case-sensitive).

-l software=???

Specifies the licensed software the job requires to run. Refer to Software for the specific string to use.

The string should be a colon separated list (no spaces) if more than one software product is used.

If your job uses licensed software and you do not specify this option (or mis-spell the software name), you will probably receive an automatically generated email from the license shadowing daemon, and the job may be terminated.

If your job uses unlicensed software, you don't need to use this flag.

You can check the lsd status and find out more by looking at the license status website.

-l other=???

Specifies other requirements or attributes of the job. The string should be a colon separated list (no spaces) if more than one attribute is required. Generally supported attributes are:

  • iobound – the job should not share a node with other IO bound jobs
  • mdss – the job requires access to the MDSS (usually via the mdss command). If MDSS is down, the job will not start.
  • gdata1 – the job requires access to the /g/data1. If /g/data1 filesystem is down, the job will not be started.
  • pernodejobfs – the job’s jobfs resource request should be treated as a per node request. Normally the jobfs request is for total jobfs summed over all nodes allocated to the job (like mem). Only relevant to distributed parallel jobs using jobfs.

You may be asked to specify other options at times to support particular needs or circumstances.

-r y

Specifies your job is restartable, and if the job is executing on a node when it crashes, the job will be requeued.

Both resources used by and resource limits set for the original job will carry over to the requeued job.
Hence a restartable job must be checkpointing such that it will still be able to complete in the remaining walltime should it suffer a node crash.

The default is that jobs are assumed to not be restartable.
Note that regardless of the restartable status of a job, time used by jobs on crashed nodes is charged against the project they are running under,
since the onus is on users to ensure minimum waste of resources via a checkpointing mechanism which they must build into any particularly long running codes.

-l wd

Start the job in the directory from which it was submitted. Normally jobs are started in the user's home directory.

 

qps jobid show the processes of a running job
qls jobid list the files in a job’s jobfs directory
qcat jobid show a running job’s stdout, stderr or script
qcp jobid copy a file from a running job’s jobfs directory

 

The man pages for these commands on the system detail the various options you will probably need to use.

 

PBS Documentation

You can find further details about PBS Professional in the following documents. Please note that not all options and features in these documents may apply to NCI systems.

PBS Professional User’s Guide

PBS Professional Reference Guide 

PBS Professional Programmer’s Guide