Page tree

Nextflow is workflow tool frequently used in bioinformatics that can be used to run complex, multi-stage pipelines. For more details on how to use nextflow see the online documentation at https://nextflow.io.

Loading nextflow

Nextflow is installed as a module in /apps. To load the module:

module load nextflow/21.04.3

Running nextflow on gadi

In nextflow, pipelines are defined as a series of tasks along with a set of a inputs and outputs for each task. Typically, each task is submitted as a separate job to the queue. This requires a long running nextflow process that can manage these tasks. The best way to run this is in its own separate batch queue job:

#!/bin/bash
#PBS -lwalltime=24:00:00,ncpus=1,mem=4G,wd
#PBS -lstorage=scratch/<abc>
#PBS -q normal
#PBS -P <abc>

module load nextflow/21.04.3

nextflow run hello

Specifying resources

The version of nextflow installed on gadi has been slightly modified to make it easier to specify resource options for jobs submitted to the queueing system. Within the nextflow.config file for your workflow:

  • Use the pbspro executor;
  • Extra flags have been added to specify project, storage and gpus as an alternative to the clusterOptions flag;
  • The disk flag can be used to reserve space in /jobfs.

As an example, the process section of your config file might contain:

process {
  executor = 'pbspro'
  queue = 'normal'
  project = '<abc>'
  storage = 'scratch/<abc>+gdata/<abc>'
       
  withName: 'task1' {
    cpus = 2
    time = '1d'
    memory = '8GB'
  }
}

which is equivalent to:

process {
  executor = 'pbspro'
  clusterOptions = '-q normal -P <abc> -lstorage=scratch/<abc>+gdata/<abc>'
       
  withName: 'task1' {
    cpus = 2
    time = '1d'
    memory = '8GB'
  }
}
  • No labels