Page tree

On this page

Overview

Gadi's PBS job scheduler is capable of managing dependencies between jobs and there is no limit of the number of dependencies per job. It can be used to control execution order and logic for a set of jobs but any logic flaw could result in unexpected behaviours such as interrupted execution chain or even uncertainty in the job results. Therefore, we highly recommend users to double check the logic before the batch submission.

Here is a brief introduction of how to use it followed by a caveat and an example.  Please read the example and caveat as they show what tested logic works on Gadi and what doesn't.  

Usage


The PBS directive `-Wdepend=` flags the job scheduler about dependencies. To add dependency to a job, pass the flag to qsub like

$ qusb -W depend=beforeany:1234567:1234578 job.sh

when submitting the job submission script `job.sh`. The dependency type in the above example is `beforeany`. It tells the scheduler to start neither the job 1234567 nor 1234578 until the submitted job finishes, with or without errors. 

There are nine types of dependency supported by PBSPro 19.2, which fall into the following three categories. 

  • after, afterok, afternotok, afterany; 
  • before, beforeok, beforenotok, beforeany
  • on

See the description of what they do in the manual of `qsub` by running `man qsub` on Gadi login node.

The `-Wdepend` directive expects a list of jobIDs joined by colon(s) after the dependency type unless it is `on` in which case the following argument should be the number of dependent jobs. In the above example, the colon-seperated job list is `1234567:1234578`.

Caveat

There are three jobs, job1, job2, and job3, and job3 is expected to start after both job1 and job2 have ended with no errors. They are submitted as the following to ensure the dependency.

$ qsub job1
16394.r-man2

$ qsub job2
16395.r-man2

$ qsub -W depend=afterok:16394:16395 job3
16396.r-man2

When queues are not busy, there is a chance that job1 and/or job2 finish execution before the job scheduler is even aware of the third job. If either of them exits the queue before job3 enters, job3 will be sitting in the queue indefinitely waiting for a finished job to finish.


Example

To avoid dependencies on missing jobs in the queue, we recommend to submit job1, job2, and job3 as the following.

$ qsub -W depend=on:2 job3
16397.r-man2

$ qsub -W depend=beforeok:16397 job1
16398.r-man2

$ qsub -W depend=beforeok:16397 job2
16399.r-man2

It forces the last job in the execution order to be the first enter the queue and waiting for its dependent jobs. 

We recommend that you continue to monitor your jobs regularly to ensure that any complex dependency chains are correctly executed and not stalled.