Page tree
Skip to end of metadata
Go to start of metadata

We do not recommend submitting job arrays on Raijin due to a number of limitations in their scheduling. Job arrays are limited in size.

If you have many small jobs that all use similar resources and will finish around the same time, instead of submitting them individually, look into aggregating them into jobs with a combined resource requirement.

Since each job will spend reasonably significant overhead in setting up the environment, and the scheduler will have to consider each of them individually to optimise the cluster usage, having many small jobs is going to increase the total waiting time for your jobs in the queue. In addition submitting many small jobs to the queue will cause all other users within the project to have to wait until your jobs are processed before their jobs can make it to the execution queues as only 300 queued jobs are allowed in the execution queue per project.

If you have dozens to a few hundred identical multi-CPU jobs, we recommend using loops like this:

Example (On a login node run the following script):
#!/bin/bash
 
for i in {1..10}; do
  sleep 2
  qsub -v PBS_ARRAY_INDEX=$i job-script
done

Please DO NOT submit THOUSANDS of single-CPU jobs using the example above!

You could run multiple single-CPU jobs in parallel if they all use similar resources and will finish around the same time by using Example 2 or 3:

Example 2 allows you to run 16 single-CPU jobs within one node.

Example 2
#!/bin/bash
#PBS -l ncpus=16
#PBS ...
for i in {1..16}; do
 ./run_my_program args ... &
done
 
wait

Example 3 allows you to run 32 single-CPU jobs across two nodes:

Example 3
#!/bin/bash
#PBS -l ncpus=32
#PBS ...

COMMAND=/bin/hostname

node_count=$PBS_NCPUS

for node in $(seq 1 $node_count); do
  pbsdsh -n $((node)) -- bash -l -c $COMMAND &
done

wait

Please note the ‘&’ at the end of the command line, and the ‘wait’ for all background tasks to finish.

 

Related articles