Page tree

Job Dependencies

The PBS scheduler on Gadi is capable of managing dependencies between jobs. This means you can tell PBS to run jobs at certain times and under a range of different conditions to make sure you are using your compute allocation efficiently and control the execution order for your jobs. There is no limit to the number of dependencies that you can set, but they do need to follow a sound flow of logic, otherwise commands could create unexpected conflicts or stange behaviour.

This page is a short introductory into the concept of job dependencies, along with some examples of sound logic.

When using dependencies, you need to flag it to the scheduler by using the command -W depend= for example 

$ qsub -W depend=beforeany:1234567:1234578 job.sh

This line uses the beforany command to tell the PBS scheduler that neither job 1234567 nor job 1234578 can start until job.sh has begun running.

When using the -W depend command, users must enter jobIDs joined by colons after the dependency type, as seen above. The only exception is when the dependency is set as 'on', in which case the following argument should be an integer matching the number of dependent jobs. 

Common issues


One of the common problems that can occur with job dependencies, is jobs completing and leaving the queue sooner than expected. In this example:

$ qsub job1
16394.r-man2
 
$ qsub job2
16395.r-man2
 
$ qsub -W depend=afterok:16394:16395 job3
16396.r-man2

This tells the PBS scheduler to run job3 after both job1 and job2 have completed with no errors. However, when Gadi is not busy, there is a chance that job1 or job2 could complete so quickly that they have exited the queue before job3 has entered. In this case, job3 will be left sitting in the queue indefinitely, as PBS can't run it under its current logic. 

A solution for this is to run the dependencies in a different order, for example

$ qsub -W depend=on:2 job3
16397.r-man2
 
$ qsub -W depend=beforeok:16397 job1
16398.r-man2
 
$ qsub -W depend=beforeok:16397 job2
16399.r-man2

In this case, the 'on' command is used, and is waiting for two jobs to be completed before it will run job3. By setting up the dependencies this way, the last job to run actually enters the queue first then waits for the others to complete, negating the risk of a job leaving the queue early. 

NCI recommends that you continue to monitor you jobs regularly to ensure that any complex dependency chains are running correctly and not stalled. 



PBSPro supports nine different types of dependency commands:

This job may be scheduled for execution at any point after all jobs in the arguments list have started execution.

This job may be scheduled for execution only after all jobs in the arguments list have terminated with no errors.

This job may be scheduled for execution only after all jobs in the arguments list have terminated with errors.

This job may be scheduled for execution after all jobs in the arguments list have finished execution, with any exit status (with or without errors.)  This job will not run if a job in the arguments list was deleted without ever having been run.

Jobs in the arguments list may begin execution once this job has begun execution.

Jobs in the arguments list may begin execution once this job terminates without errors.

If this job terminates execution with errors, jobs in the arguments list may begin. 

Jobs in the arguments list may begin execution once this job terminates execution, with or without errors.

This job may be scheduled for execution after a number of dependencies on other jobs have been satisfied.  This type is used in conjunction with one of the before types listed. Count is an integer greater than 0.


Authors: Yue Sun, Andrew Wellington, Mohsin Ali