Nextflow configuration

Last updated on 2025-06-27 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • How do I configure a Nextflow workflow?
  • How do I assign different resources to different processes?
  • How do I separate and provide configuration for different computational systems?

Objectives

  • Create a Nextflow configuration file.
  • Be able to assign resources to a process.
  • Be able to inspect configuration settings before running a workflow.

Nextflow configuration


A key Nextflow feature is the ability to decouple the workflow implementation, which describes the flow of data and operations to perform on that data, from the configuration settings required by the underlying execution platform. This enables the workflow to be portable, allowing it to run on different computational platforms such as an institutional HPC or cloud infrastructure, without needing to modify the workflow implementation.

We have seen earlier that it is possible to provide a process with directives. These directives are process specific configuration settings. Similarly, we have also provided parameters to our workflow which are parameter configuration settings. These configuration settings can be separated from the workflow implementation, into a configuration file.

Settings in a configuration file are sets of name-value pairs (name = value). The name is a specific property to set, while the value can be anything you can assign to a variable (for ex. strings, booleans, or other variables). It is also possible to access any variable defined in the host environment such as $PATH, $HOME, $PWD, etc.

Configuration file

Generally, variables and functions defined in a configuration file are not accessible from the workflow script. Only variables defined using the params scope and the env scope (without env prefix) can be accessed from the workflow script.

Settings are also partitioned into scopes, which govern the behaviour of different elements of the workflow. For example, workflow parameters are governed from the params scope, while process directives are governed from the process scope. A full list of the available scopes can be found in the documentation. It is also possible to define your own scope.

Task 10.1

Configuration settings for a workflow are often stored in the file nextflow.config which is in the same directory as the workflow script. Configuration can be written in either of two ways. The first is using dot notation, and the second is using brace notation. Both forms of notation can be used in the same configuration file.

An example of dot notation:

GROOVY

params.outdir = "${baseDir}/results"   // The workflow parameter "outdir" is assigned the value base output directory and './results' subfolder to use by default.
params.meta = "${baseDir}/params/meta.csv"
params.effects = "${baseDir}/params/effects.csv"
params.subgroup = "${baseDir}/params/subgroup.csv"
params.school_data = "${baseDir}/data/each_period.tar.gz"
params.school_info = "${baseDir}/params/school_info.json"
params.composition_data = "${baseDir}/data/composition_each_period.tar.gz"

An example of brace notation:

GROOVY

params {
                outdir = "${baseDir}/results"
                batches = 1
                meta = "${baseDir}/params/meta.csv"
                effects = "${baseDir}/params/effects.csv"
                subgroup = "${baseDir}/params/subgroup.csv"
                school_data = "${baseDir}/data/each_period.tar.gz"
                school_info = "${baseDir}/params/school_info.json"
                composition_data = "${baseDir}/data/composition_each_period.tar.gz"
}

Configuration files can also be separated into multiple files and included into another using the includeConfig "params.config" statement.

How configuration files are combined

Configuration settings can be spread across several files. This also allows settings to be overridden by other configuration files. The priority of a setting is determined by the following order, ranked from highest to lowest.

  1. Parameters specified on the command line (--param_name value).
  2. Parameters provided using the -params-file option.
  3. Config file specified using the -c my_config option.
  4. The config file named nextflow.config in the current directory.
  5. The config file named nextflow.config in the workflow project directory ($projectDir: the directory where the script to be run is located).
  6. The config file $HOME/.nextflow/config.
  7. Values defined within the workflow script itself (e.g., main.nf).

If configuration is provided by more than one of these methods, configuration is merged giving higher priority to configuration provided higher in the list.

Configuring Nextflow vs Configuring a Nextflow workflow

The majority of Nextflow configuration settings must be provided on the command-line, however a handful of settings can also be provided within a configuration file, such as workdir = '/path/to/work/dir' (-w /path/to/work/dir) or resume = true (-resume), and do not belong to a configuration scope.

Configuring process behaviour

Earlier we saw that process directives allow the specification of settings for the task execution such as cpus, memory, conda and other resources in the pipeline script. This is useful when prototyping a small workflow script, however this ties the configuration to the workflow, making it less portable. A good practice is to separate the process configuration settings into another file.

The process configuration scope allows the setting of any process directives in the conf/ directory.

Task 10.2

Navigate to the conf folder and open the local.config file. What qualifier is being used to allocate resources to the process, and how many resources does this involve?

GROOVY

process {
	withLabel: small_time_cpus {
		executor = 'local'
    	cache='lenient'
		cpus = 2
	}
}

Unit values

Memory and time duration units can be specified either using a string based notation in which the digit(s) and the unit can be separated by a space character, or by using the numeric notation in which the digit(s) and the unit are separated by a dot character and not enclosed by quote characters.

String syntax Numeric syntax Value
‘10 KB’ 10.KB 10240 bytes
‘500 MB’ 500.MB 524288000 bytes
‘1 min’ 1.min 60 seconds
‘1 hour 25 sec’ - 1 hour and 25 seconds

These settings are applied to all processes in the workflow. A process selector can be used to apply the configuration to a specific process or group of processes.

Process selectors

When a workflow has many processes, it is inconvenient to specify directives for all processes individually, especially if directives are repeated for groups of processes. A helpful strategy is to annotate the processes using the label directive (processes can have multiple labels). The withLabel selector then allows the configuration of all processes annotated with a specific label, as shown below:

Another strategy is to use process selector expressions. Both withName: and withLabel: allow the use of regular expressions to apply the same configuration to all processes matching a pattern. Regular expressions must be quoted, unlike simple process names or labels.

  • The | matches either-or, e.g., withName: 'small_time_cpus|big_mem' applies the configuration to any process matching the name small_time_cpus or big_mem.
  • The ! inverts a selector, e.g., withLabel: '!big_mem' applies the configuration to any process without the big_mem label.
  • The .* matches any number of characters, e.g., withName: 'small_time_cpus:big_mem:.*' matches all processes of the workflow small_time_cpus:big_mem.

A regular expression cheat-sheet can be found here if you would like to write more expressive expressions.

Selector priority

When mixing generic process configuration and selectors, the following priority rules are applied (from highest to lowest):

  1. withName selector definition.
  2. withLabel selector definition.
  3. Process specific directive defined in the workflow script.
  4. Process generic process configuration.

Dynamic expressions

A common scenario is that configuration settings may depend on the data being processed. Such settings can be dynamically expressed using a closure.

Task 10.3

For example, we can specify the memory required as a multiple of the number of cpus. Similarly, we can publish results to a subfolder based on the sample name.

GROOVY


process ESTIMATION {
    
  tag{school_period}
  label 'small_time_cpus'
  
  errorStrategy { task.exitStatus == 140 ? 'retry' : 'ignore' } 
  maxRetries 1

  .
  .
  .
}

process {
    .
    .
    .
    withLabel: small_time_cpus {
    executor = 'slurm'
    time   = { 2.h   * task.attempt }
    clusterOptions = "--account=none --mem=20G --partition=nodes --nodes=1 --cpus-per-task=10"
  }
    .
    .
    .
}

Configuring execution platforms

Nextflow supports a wide range of execution platforms, from running locally, to running on HPC clusters or cloud infrastructures. See https://www.nextflow.io/docs/latest/executor.html for the full list of supported executors.

Task 10.4

The process.executor directive allows you to override the executor to be used by a specific process. This can be useful, for example, when there are short running tasks that can be run locally, and are unsuitable for submission to HPC executors (check for guidelines on best practice use of your execution system). Other process directives such as process.clusterOptions, process.queue, and process.machineType can be also be used to further configure processes depending on the executor used.

GROOVY

//conf/slurm.config
process {
    withLabel: big_mem {
    executor = 'slurm'
    clusterOptions = "--account=none --time=15:00 --mem=7G --partition=nodes --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 "
  }
    withLabel: small_time_cpus {
    executor = 'slurm'
    time   = { 2.h   * task.attempt }
    clusterOptions = "--account=none --mem=20G --partition=nodes --nodes=1 --cpus-per-task=10"
  }
    withLabel: big_time_cpus {
    executor = 'slurm'
    clusterOptions = "--account=none --time=10:00 --mem=1G --partition=nodes --nodes=1 --cpus-per-task=10"
  }
}

Configuring software requirements

Docker is a container technology. Container images are lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Containerized software is intended to run the same regardless of the underlying infrastructure, unlike other package management technologies which are operating system dependant (See the published article on Nextflow). For each container image used, Nextflow uses Docker to spawn an independent and isolated container instance for each process task.

To use Docker, we must provide a container image path using the process.container directive, and also enable docker in the docker scope, docker.enabled = true. A container image path takes the form (protocol://)registry/repository/image:version--build. By default, Docker containers run software using a privileged user. This is where Apptainer is preferred for computer cluster.

Software configuration using Apptainer (former Singularity)

Singularity is another container technology, commonly used on HPC clusters. It is different to Docker in several ways. The primary differences are that processes are run as the user, and certain directories are automatically “mounted” (made available) in the container instance. Singularity also supports building Singularity images from Docker images, allowing Docker image paths to be used as values for process.container.

Singularity is enabled in a similar manner to Docker. A container image path must be provided using process.container and singularity enabled using apptainer.enabled = true.

See episode 12 for more information on Auxiliary tools.

Container protocols

The following protocols are supported:

  • docker://: download the container image from the Docker Hub and convert it to the Singularity format (default).
  • library://: download the container image from the Singularity Library service.
  • shub://: download the container image from the Singularity Hub.
  • https://: download the singularity image from the given URL.
  • file://: use a singularity image on local computer storage.

Configuration profiles


One of the most powerful features of Nextflow configuration is to predefine multiple configurations or profiles for different execution platforms. This allows a group of predefined settings to be called with a short invocation, -profile <profile name>.

Task 10.5

Configuration profiles are defined in the profiles scope, which group the attributes that belong to the same profile using a common prefix.

GROOVY

//nextflow.config

profiles {
  local {
    includeConfig 'conf/local.config'
    docker.enabled = true
    process.container = 'omiridoue/siena_r:0.8'
  }
  slurm {
    includeConfig 'conf/slurm.config'
    apptainer.enabled = true

    apptainer.cacheDir = "apptainer"
    apptainer.autoMounts = true

    process.executor = 'slurm'
    process.container = 'omiridoue/siena_r:0.8'
  }
}

This configuration defines three different profiles: local, and slurm that set different process configuration strategies depending on the target execution platform. By convention the standard profile is implicitly used when no other profile is specified by the user. To enable a specific profile use -profile option followed by the profile name:

BASH

nextflow run <your script> -profile local

Key Points

  • Nextflow configuration can be managed using a Nextflow configuration file.
  • Nextflow configuration files are plain text files containing a set of properties.
  • You can define process specific settings, such as cpus and memory, within the process scope.
  • You can assign different resources to different processes using the process selectors withName or withLabel.
  • You can define a profile for different configurations using the profiles scope. These profiles can be selected when launching a pipeline execution by using the -profile command-line option
  • Nextflow configuration settings are evaluated in the order they are read-in.