Nextflow configuration
Last updated on 2025-06-27 | Edit this page
Overview
Questions
- How do I configure a Nextflow workflow?
- How do I assign different resources to different processes?
- How do I separate and provide configuration for different computational systems?
Objectives
- Create a Nextflow configuration file.
- Be able to assign resources to a process.
- Be able to inspect configuration settings before running a workflow.
Nextflow configuration
A key Nextflow feature is the ability to decouple the workflow implementation, which describes the flow of data and operations to perform on that data, from the configuration settings required by the underlying execution platform. This enables the workflow to be portable, allowing it to run on different computational platforms such as an institutional HPC or cloud infrastructure, without needing to modify the workflow implementation.
We have seen earlier that it is possible to provide a
process
with directives. These directives are process
specific configuration settings. Similarly, we have also provided
parameters to our workflow which are parameter configuration settings.
These configuration settings can be separated from the workflow
implementation, into a configuration file.
Settings in a configuration file are sets of name-value pairs
(name = value
). The name
is a specific
property to set, while the value
can be anything you can
assign to a variable (for ex. strings, booleans, or other variables). It
is also possible to access any variable defined in the host environment
such as $PATH
, $HOME
, $PWD
,
etc.
Configuration file
Generally, variables and functions defined in a configuration file
are not accessible from the workflow script. Only variables defined
using the params
scope and the env
scope
(without env
prefix) can be accessed from the workflow
script.
Settings are also partitioned into scopes, which govern the behaviour
of different elements of the workflow. For example, workflow parameters
are governed from the params
scope, while process
directives are governed from the process
scope. A full list
of the available scopes can be found in the documentation.
It is also possible to define your own scope.
Task 10.1
Configuration settings for a workflow are often stored in the file
nextflow.config
which is in the same directory as the
workflow script. Configuration can be written in either of two ways. The
first is using dot notation, and the second is using brace notation.
Both forms of notation can be used in the same configuration file.
An example of dot notation:
GROOVY
params.outdir = "${baseDir}/results" // The workflow parameter "outdir" is assigned the value base output directory and './results' subfolder to use by default.
params.meta = "${baseDir}/params/meta.csv"
params.effects = "${baseDir}/params/effects.csv"
params.subgroup = "${baseDir}/params/subgroup.csv"
params.school_data = "${baseDir}/data/each_period.tar.gz"
params.school_info = "${baseDir}/params/school_info.json"
params.composition_data = "${baseDir}/data/composition_each_period.tar.gz"
An example of brace notation:
GROOVY
params {
outdir = "${baseDir}/results"
batches = 1
meta = "${baseDir}/params/meta.csv"
effects = "${baseDir}/params/effects.csv"
subgroup = "${baseDir}/params/subgroup.csv"
school_data = "${baseDir}/data/each_period.tar.gz"
school_info = "${baseDir}/params/school_info.json"
composition_data = "${baseDir}/data/composition_each_period.tar.gz"
}
Configuration files can also be separated into multiple files and
included into another using the
includeConfig "params.config"
statement.
How configuration files are combined
Configuration settings can be spread across several files. This also allows settings to be overridden by other configuration files. The priority of a setting is determined by the following order, ranked from highest to lowest.
- Parameters specified on the command line
(
--param_name value
). - Parameters provided using the
-params-file
option. - Config file specified using the
-c
my_config option. - The config file named
nextflow.config
in the current directory. - The config file named
nextflow.config
in the workflow project directory ($projectDir
: the directory where the script to be run is located). - The config file
$HOME/.nextflow/config
. - Values defined within the workflow script itself (e.g.,
main.nf
).
If configuration is provided by more than one of these methods, configuration is merged giving higher priority to configuration provided higher in the list.
Configuring Nextflow vs Configuring a Nextflow workflow
The majority of Nextflow configuration settings must be provided on
the command-line, however a handful of settings can also be provided
within a configuration file, such as
workdir = '/path/to/work/dir'
(-w /path/to/work/dir
) or resume = true
(-resume
), and do not belong to a configuration scope.
Configuring process behaviour
Earlier we saw that process
directives allow the
specification of settings for the task execution such as
cpus
, memory
, conda
and other
resources in the pipeline script. This is useful when prototyping a
small workflow script, however this ties the configuration to the
workflow, making it less portable. A good practice is to separate the
process configuration settings into another file.
The process
configuration scope allows the setting of
any process directives in the conf/
directory.
Task 10.2
Navigate to the conf
folder and open the local.config
file. What qualifier is being used to allocate resources to the process,
and how many resources does this involve?
Unit values
Memory and time duration units can be specified either using a string based notation in which the digit(s) and the unit can be separated by a space character, or by using the numeric notation in which the digit(s) and the unit are separated by a dot character and not enclosed by quote characters.
String syntax | Numeric syntax | Value |
---|---|---|
‘10 KB’ | 10.KB | 10240 bytes |
‘500 MB’ | 500.MB | 524288000 bytes |
‘1 min’ | 1.min | 60 seconds |
‘1 hour 25 sec’ | - | 1 hour and 25 seconds |
These settings are applied to all processes in the workflow. A process selector can be used to apply the configuration to a specific process or group of processes.
Process selectors
When a workflow has many processes, it is inconvenient to specify
directives for all processes individually, especially if directives are
repeated for groups of processes. A helpful strategy is to annotate the
processes using the label
directive (processes can have
multiple labels). The withLabel
selector then allows the
configuration of all processes annotated with a specific label, as shown
below:
Another strategy is to use process selector expressions. Both
withName:
and withLabel:
allow the use of
regular expressions to apply the same configuration to all processes
matching a pattern. Regular expressions must be quoted, unlike simple
process names or labels.
- The
|
matches either-or, e.g.,withName: 'small_time_cpus|big_mem'
applies the configuration to any process matching the namesmall_time_cpus
orbig_mem
. - The
!
inverts a selector, e.g.,withLabel: '!big_mem'
applies the configuration to any process without thebig_mem
label. - The
.*
matches any number of characters, e.g.,withName: 'small_time_cpus:big_mem:.*'
matches all processes of the workflowsmall_time_cpus:big_mem
.
A regular expression cheat-sheet can be found here if you would like to write more expressive expressions.
Selector priority
When mixing generic process configuration and selectors, the following priority rules are applied (from highest to lowest):
-
withName
selector definition. -
withLabel
selector definition. - Process specific directive defined in the workflow script.
- Process generic
process
configuration.
Dynamic expressions
A common scenario is that configuration settings may depend on the data being processed. Such settings can be dynamically expressed using a closure.
Task 10.3
For example, we can specify the memory
required as a
multiple of the number of cpus
. Similarly, we can publish
results to a subfolder based on the sample name.
GROOVY
process ESTIMATION {
tag{school_period}
label 'small_time_cpus'
errorStrategy { task.exitStatus == 140 ? 'retry' : 'ignore' }
maxRetries 1
.
.
.
}
process {
.
.
.
withLabel: small_time_cpus {
executor = 'slurm'
time = { 2.h * task.attempt }
clusterOptions = "--account=none --mem=20G --partition=nodes --nodes=1 --cpus-per-task=10"
}
.
.
.
}
Configuring execution platforms
Nextflow supports a wide range of execution platforms, from running locally, to running on HPC clusters or cloud infrastructures. See https://www.nextflow.io/docs/latest/executor.html for the full list of supported executors.
Task 10.4
The process.executor
directive allows you to override
the executor to be used by a specific process. This can be useful, for
example, when there are short running tasks that can be run locally, and
are unsuitable for submission to HPC executors (check for guidelines on
best practice use of your execution system). Other process directives
such as process.clusterOptions
, process.queue
,
and process.machineType
can be also be used to further
configure processes depending on the executor used.
GROOVY
//conf/slurm.config
process {
withLabel: big_mem {
executor = 'slurm'
clusterOptions = "--account=none --time=15:00 --mem=7G --partition=nodes --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 "
}
withLabel: small_time_cpus {
executor = 'slurm'
time = { 2.h * task.attempt }
clusterOptions = "--account=none --mem=20G --partition=nodes --nodes=1 --cpus-per-task=10"
}
withLabel: big_time_cpus {
executor = 'slurm'
clusterOptions = "--account=none --time=10:00 --mem=1G --partition=nodes --nodes=1 --cpus-per-task=10"
}
}
Configuring software requirements
Docker is a container technology. Container images are lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Containerized software is intended to run the same regardless of the underlying infrastructure, unlike other package management technologies which are operating system dependant (See the published article on Nextflow). For each container image used, Nextflow uses Docker to spawn an independent and isolated container instance for each process task.
To use Docker, we must provide a container image path using the
process.container
directive, and also enable docker in the
docker scope, docker.enabled = true
. A container image path
takes the form
(protocol://)registry/repository/image:version--build
. By
default, Docker containers run software using a privileged user. This is
where Apptainer is preferred for computer cluster.
Software configuration using Apptainer (former Singularity)
Singularity is another container technology, commonly used on HPC
clusters. It is different to Docker in several ways. The primary
differences are that processes are run as the user, and certain
directories are automatically “mounted” (made available) in the
container instance. Singularity also supports building Singularity
images from Docker images, allowing Docker image paths to be used as
values for process.container
.
Singularity is enabled in a similar manner to Docker. A container
image path must be provided using process.container
and
singularity enabled using apptainer.enabled = true
.
See episode 12 for more information on Auxiliary tools.
Container protocols
The following protocols are supported:
-
docker://
: download the container image from the Docker Hub and convert it to the Singularity format (default). -
library://
: download the container image from the Singularity Library service. -
shub://
: download the container image from the Singularity Hub. -
https://
: download the singularity image from the given URL. -
file://
: use a singularity image on local computer storage.
Configuration profiles
One of the most powerful features of Nextflow configuration is to
predefine multiple configurations or profiles
for different
execution platforms. This allows a group of predefined settings to be
called with a short invocation,
-profile <profile name>
.
Task 10.5
Configuration profiles are defined in the profiles
scope, which group the attributes that belong to the same profile using
a common prefix.
GROOVY
//nextflow.config
profiles {
local {
includeConfig 'conf/local.config'
docker.enabled = true
process.container = 'omiridoue/siena_r:0.8'
}
slurm {
includeConfig 'conf/slurm.config'
apptainer.enabled = true
apptainer.cacheDir = "apptainer"
apptainer.autoMounts = true
process.executor = 'slurm'
process.container = 'omiridoue/siena_r:0.8'
}
}
This configuration defines three different profiles:
local
, and slurm
that set different process
configuration strategies depending on the target execution platform. By
convention the standard profile is implicitly used when no other profile
is specified by the user. To enable a specific profile use
-profile
option followed by the profile name:
Key Points
- Nextflow configuration can be managed using a Nextflow configuration file.
- Nextflow configuration files are plain text files containing a set of properties.
- You can define process specific settings, such as cpus and memory,
within the
process
scope. - You can assign different resources to different processes using the
process selectors
withName
orwithLabel
. - You can define a profile for different configurations using the
profiles
scope. These profiles can be selected when launching a pipeline execution by using the-profile
command-line option - Nextflow configuration settings are evaluated in the order they are read-in.