Nextflow

Nextflow is workflow tool frequently used in bioinformatics that can be used to run complex, multi-stage pipelines. For more details on how to use nextflow see the online documentation at https://nextflow.io.

Loading nextflow

Nextflow is installed as a module in /apps. To load the module:

module load nextflow/24.04.1

Running nextflow on gadi

In nextflow, pipelines are defined as a series of tasks along with a set of a inputs and outputs for each task. Typically, each task is submitted as a separate job to the queue. This requires a long running nextflow process that can manage these tasks. The best way to run this is in its own separate batch queue job:

#!/bin/bash
#PBS -lwalltime=24:00:00,ncpus=1,mem=4G,wd
#PBS -lstorage=scratch/<abc>
#PBS -q normal
#PBS -P <abc>

module load nextflow/21.04.3

nextflow run hello

Specifying resources

The version of nextflow installed on gadi has been slightly modified to make it easier to specify resource options for jobs submitted to the queueing system. Within the nextflow.config file for your workflow:

Use the pbspro executor;
Extra flags have been added to specify project, storage and gpus as an alternative to the clusterOptions flag;
The disk flag can be used to reserve space in /jobfs.

As an example, the process section of your config file might contain:

process {
  executor = 'pbspro'
  queue = 'normal'
  project = '<abc>'
  storage = 'scratch/<abc>+gdata/<abc>'
       
  withName: 'task1' {
    cpus = 2
    time = '1d'
    memory = '8GB'
  }
}

which is equivalent to:

process {
  executor = 'pbspro'
  clusterOptions = '-q normal -P <abc> -lstorage=scratch/<abc>+gdata/<abc>'
       
  withName: 'task1' {
    cpus = 2
    time = '1d'
    memory = '8GB'
  }
}

Plugins

Nextflow plugins can be used to extend the functionality of nextflow at runtime. Early versions of nextflow that have been modified to work with NCI's queue system have been incompatible with the plugins. Starting from nextflow/24.04.1 the installation process has been updated work with the plugin ecosystem. Note that since compute nodes on gadi do not have external network access, plugins need to installed manually using the instructions in the nextflow documentation for offline usage.

Page tree

Nextflow

Loading nextflow

Running nextflow on gadi

Specifying resources

Plugins