Page tree
Skip to end of metadata
Go to start of metadata

We do not recommend submitting job arrays on Raijin due to a number of limitations in their scheduling. Job arrays are limited in size.

If you have many small jobs that all use similar resources and will finish around the same time, instead of submitting them individually, look into aggregating them into jobs with a combined resource requirement.

Since each job will spend reasonably significant overhead in setting up the environment, and the scheduler will have to consider each of them individually to optimise the cluster usage, having many small jobs is going to increase the total waiting time for your jobs in the queue. In addition submitting many small jobs to the queue will cause all other users within the project to have to wait until your jobs are processed before their jobs can make it to the execution queues as only 300 queued jobs are allowed in the execution queue per project.

If you have dozens to a few hundred identical multi-CPU jobs, we recommend using loops like this:

Example (On a login node run the following script):
for i in {1..10}; do
  sleep 2
  qsub -v PBS_ARRAY_INDEX=$i job-script

Please DO NOT submit THOUSANDS of single-CPU jobs using the example above!

You could run multiple single-CPU jobs in parallel if they all use similar resources and will finish around the same time by using Example 2 or 3:

Example 2 allows you to run 16 single-CPU jobs within one node.

Example 2
#PBS -l ncpus=16
#PBS ...
for i in {1..16}; do
 ./run_my_program args ... &

Example 3 allows you to run 32 single-CPU jobs across two nodes:

Example 3
#PBS -l ncpus=32
#PBS ...



for node in $(seq 1 $node_count); do
  pbsdsh -n $((node)) -- bash -l -c $COMMAND &


Please note the ‘&’ at the end of the command line, and the ‘wait’ for all background tasks to finish.

Example 3 above assumes that the commands you run in each pbdsh command take approximately the same time. Unfortunately, this is not always the case and example 4 below shows how to run many single CPU tasks that may need (very) different time to execute.

Many thanks to one of our users, Scott Wales, for sharing this example.

Example 4

# Run an embarrassingly parallel job, where each command is totally independent
# Uses gnu parallel as a task scheduler, then executes each task on the available cpus with pbsdsh

#PBS -q normal
#PBS -l ncpus=256
#PBS -l walltime=48:00:00
#PBS -l mem=500gb
#PBS -l wd

module load parallel/20150322

SCRIPT=./  # Script to run.
INPUTS=inputs.txt   # Each line in this file is used as arguments to ${SCRIPT}
                    # It's fine to have more input lines than you have requested cpus,
                    # extra jobs will be executed as cpus become available

# Here '{%}' gets replaced with the job slot ({1..$PBS_NCPUS})
# and '{}' gets replaced with a line from ${INPUTS}.
# Pbsdsh starts a very minimal shell. `bash -l` loads all of your startup files, so that things like modules work.
# The `-c` is so that bash separates out the arguments correctly (otherwise they're all in a single string)

parallel -j ${PBS_NCPUS} pbsdsh -n {%} -- bash -l -c "'${SCRIPT} {}'" :::: ${INPUTS}

When using example 4 it is important to select an appropriate number of CPUs vs. number of tasks that needs to be run. (A number of tasks in the example is equal to the number of lines in input file INPUTS). As a rough guide, select the number of cpus to be about 10 times smaller than the number of tasks. Example 4 uses 256 CPUs, this should be OK for 3000-10000 tasks.

Other useful parallel lines:

1) To run commands in the input.cmd file:

 cat input.cmd | parallel -j ${PBS_NCPUS}  pbsdsh -n {%} -- bash -l -c '{}'

2) To run only one command on a node (assuming 16 cores per node):

parallel -j $((${PBS_NCPUS}/16)) --rpl '{%} 1 $_=($job->slot()-1)*16' pbsdsh -n {%}  -- bash -l -c "'${SCRIPT} {}'" :::: ${INPUTS}

Related articles