Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
borderColor#21618C
bgColor#F6F7F7
titleColor#17202A
borderWidth1
titleBGColor#FFB96A
borderStyleridge
titleJob Dependencies

The PBS scheduler on Gadi is capable of managing dependencies between jobs. This means you can tell PBS to run jobs at certain times and under a range of different conditions to make sure you are using your compute allocation efficiently and control the execution order for your jobs. There is no limit to the number of dependencies that you can set, but they do need to follow a sound flow of logic, otherwise commands could create unexpected conflicts or stange behaviour.

This page is a short introductory into the concept of job dependencies, along with some examples of sound logic.

When using dependencies, you need to flag it to the scheduler by using the command-W depend= for example 

Code Block
themeFadeToGrey
$ qsub -W depend=beforeany:1234567:1234578 job.sh

This line uses the beforany command to tell the PBS scheduler that neither job 1234567nor job 1234578 can start until job.sh has begun running.

When using the -W depend command, users must enter jobIDs joined by colons after the dependency type, as seen above. The only exception is when the dependency is set as 'on', in which case the following argument should be an integer matching the number of dependent jobs. 

Common issues


One of the common problems that can occur with job dependencies, is jobs completing and leaving the queue sooner than expected. In this example:

Code Block
themeFadeToGrey
$ qsub job1
16394.r-man2
 
$ qsub job2
16395.r-man2
 
$ qsub -W depend=afterok:16394:16395 job3
16396.r-man2

This tells the PBS scheduler to run job3after both job1 and job2 have completed with no errors. However, when Gadi is not busy, there is a chance thatjob1 orjob2 could complete so quickly that they have exited the queue before job3 has entered. In this case,job3 will be left sitting in the queue indefinitely, as PBS can't run it under its current logic. 

A solution for this is to run the dependencies in a different order, for example

Code Block
themeFadeToGrey
$ qsub -W depend=on:2 job3
16397.r-man2
 
$ qsub -W depend=beforeok:16397 job1
16398.r-man2
 
$ qsub -W depend=beforeok:16397 job2
16399.r-man2

In this case, the 'on' command is used, and is waiting for two jobs to be completed before it will run job3. By setting up the dependencies this way, the last job to run actually enters the queue first then waits for the others to complete, negating the risk of a job leaving the queue early. 

Tip
NCI recommends that you continue to monitor you jobs regularly to ensure that any complex dependency chains are running correctly and not stalled. 



Panel
borderColor#21618C
bgColor#F6F7F7
borderWidth1
borderStyleridge

PBSPro supports nine different types of dependency commands:

Expand
titleAfter

This job may be scheduled for execution at any point after all jobs in the arguments list have started execution.

Expand
titleAfterok

This job may be scheduled for execution only after all jobs in the arguments list have terminated with no errors.

Expand
titleAfternotok

This job may be scheduled for execution only after all jobs in the arguments list have terminated with errors.

Expand
titleAfterany

This job may be scheduled for execution after all jobs in the arguments list have finished execution, with any exit status (with or without errors.)  This job will not run if a job in the arguments list was deleted without ever having been run.

Expand
titleBefore

Jobs in the arguments list may begin execution once this job has begun execution.

Expand
titleBeforeok

Jobs in the arguments list may begin execution once this job terminates without errors.

Expand
titleBeforenotok

If this job terminates execution with errors, jobs in the arguments list may begin. 

Expand
titleBeforeany

Jobs in the arguments list may begin execution once this job terminates execution, with or without errors.

Expand
titleOn

This job may be scheduled for execution after a number of dependencies on other jobs have been satisfied.  This type is used in conjunction with one of the before types listed. Count is an integer greater than 0.


Authors: Yue Sun, Andrew Wellington, Mohsin Ali