Page tree
Skip to end of metadata
Go to start of metadata

NCI uses PBSPro for job submission and scheduling.

Quick Syntax guide

qstat

 

Standard queue status command supplied by PBS. See man qstat for details of options. (But see the local nqstat command below.)
nqstatLocal version of qstat. The queue header of nqstat gives the limit on wall clock time and memory for you and your project.

qdel jobid

Delete your unwanted jobs from the queues. The jobid is returned by qsub at job submission time, and is also displayed in the nqstat output.
qsub

Submit jobs to the queues. The simplest use of the qsub command is typified by the following example (note that there is are carriage-returns after -l wd and ./a.out):

 

$ qsub -P a99 -q normal -l walltime=20:00:00,mem=300MB -l wd
./a.out
^D     (that is control-D)

or

$ qsub -P a99 -q normal -l walltime=20:00,mem=300MB -l wd jobscript

where jobscript is an ascii file containing the shell script to run your commands (not the compiled executable which is a binary file).
More conveniently, the qsub options can be placed within the script to avoid typing them for each job:

#!/bin/bash
#PBS -P a99 
#PBS -q normal 
#PBS -l walltime=20:00:00,mem=300MB 
#PBS -l wd
./a.out

You submit this script for execution by PBS using the command:

$ qsub jobscript

You may need to enter data to the program and may be used to doing this interactively when prompted by the program.

There are two ways of doing this in batch jobs.

If, for example, the program requires the numbers 1000 then 50 to be entered when prompted. You can either create a file called, say, input containing these values

$ cat input
1000
50

then run the program as

 ./a.out < input

or the data can be included in the batch job script as follows:

#!/bin/bash
#PBS -P a99 
#PBS -q normal 
#PBS -l walltime=20:00:00,mem=300MB 
#PBS -l wd
./a.out << EOF 
1000
50
EOF

Notice that the PBS directives are all at the start of the script, that there are no blank lines between them, and there are no other non-PBS commands until after all the PBS directives.

qsub options of note:

-P projectThe project which you want to charge the jobs resource usage to. The default project is specified by the PROJECT environment variable.
-q queue

Select the queue to run the job in. The queues you can use are listed by running nqstat.

-l walltime=??:??:??The wall clock time limit for the job. Time is expressed in seconds as an integer, or in the form:
[[hours:]minutes:]seconds[.milliseconds]
System scheduling decisions depend heavily on the walltime request – it is always best to make as accurate a request as possible.
-l mem=???MB

The total memory limit across all nodes for the job – can be specified with units of “MB” or “GB” but only integer values can be given. There is a small default value.
Your job will only run if there is sufficient free memory so making a sensible memory request will allow your jobs to run sooner.

A little trial and error may be required to find how much memory your jobs are using – nqstat lists jobs actual usage.

-l ncpus=?

The number of cpus required for the job to run on. The default is 1.

 -l ncpus=N - If the number of cpus requested, N, is small (currently 16 or less on NF systems) the job will run within a single shared memory node.

If the number of cpus specified is greater, the job will be distributed over multiple nodes. Currently on NF systems, these larger requests are restricted to multiples of 16 cpus.

-l jobfs=???GB

The requested job scratch space. This will reserve disk space, making it unavailable for other jobs, so please do not over estimate your needs.

Any files created in the $PBS_JOBFS directory are automatically removed at the end of the job. Ensure that you use integers, and units of mb, MB, gb, or GB.

-l software=???

Specifies licensed software the job requires to run. See the software for the string to use for specific software.

The string should be a colon separated list (no spaces) if more than one software product is used.

If your job uses licensed software and you do not specify this option (or mis-spell the software), you will probably receive an automatically generated email from the license shadowing daemon, and the job may be terminated.

You can check the lsd status and find out more by looking at the license status website.

-l other=???

Specifies other requirements or attributes of the job. The string should be a colon separated list (no spaces) if more than one attribute is required. Generally supported attributes are:

  • iobound – the job should not share a node with other IO bound jobs
  • mdss – the job requires access to the MDSS (usually via the mdss command). If MDSS is down, the job will not be started.
  • gdata1 – the job requires access to the /g/data1. If /g/data1 filesystem is down, the job will not be started.
  • pernodejobfs – the job’s jobfs resource request should be treated as a per node request.
    Normally the jobfs request is for total jobfs summed over all nodes allocated to the job (like mem). Only relevant to distributed parallel jobs using jobfs.

  You may be asked to specify other options at times to support particular needs or circumstances.

-r y

Specifies your job is restartable, and if the job is executing on a node when it crashes, the job will be requeued.

Both resources used by and resource limits set for the original job will carry over to the requeued job.
Hence a restartable job must be checkpointing such that it will still be able to complete in the remaining walltime should it suffer a node crash.

The default is that jobs are assumed to not be restartable.
Note that regardless of the restartable status of a job, time used by jobs on crashed nodes is charged against the project they are running under,
since the onus is on users to ensure minimum waste of resources via a checkpointing mechanism which they must build into any particularly long running codes.

-l wd

Start the job in the directory from which it was submitted. Normally jobs are started in the users home directory.

 

qps jobid show the processes of a running job
qls jobid list the files in a job’s jobfs directory
qcat jobid show a running job’s stdout, stderr or script
qcp jobid copy a file from a running job’s jobfs directory

 

The man pages for these commands on the system detail the various options you will probably need to use.

 

PBS Documentation

You can find further details about PBS Professional in the following documents. Please note that note all options and features in these documents may apply to NCI systems.

PBS Professional User’s Guide

PBS Professional Reference Guide 

PBS Professional Programmer’s Guide