Parameters

Last updated on 2025-06-28 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • How can I change the data a workflow uses?
  • How can I parameterise a workflow?
  • How can I add my parameters to a file?

Objectives

  • Use pipeline parameters to change the input to a workflow.
  • Add pipeline parameters to a Nextflow script.
  • Understand how to create and use a parameter file.

In the first episode we ran the Nextflow script, 02_hello_nextflow.nf, from the command line and it de-compressed the archive folder each_period.tar.gz that contained synthetic data on 4 individual schools for two time points. To change the input to script we can make use of pipeline parameters.

Pipeline parameters


The Nextflow 02_hello_nextflow.nf script defines a pipeline parameter params.input. Pipeline parameters enable you to change the input to the workflow at runtime, via the command line or a configuration file, so they are not hard-coded into the script.

Pipeline parameters are declared in the workflow by prepending the prefix params, separated by the dot character, to a variable name e.g., params.input.

Their value can be specified on the command line by prefixing the parameter name with a double dash character, e.g., --input.

In the script 02_hello_nextflow.nf the pipeline parameter params.input was specified with the file path "data/each_period.tar.gz".

Task 3.1

The input for data can be passed with the --variable_name convention. In this case we used a named input for our parameter. Note variables can also be specified through the command line using two dashes. Any other options would normally be specified using a single dash, this includes the -resume tag which is important for code development, we can come back to it in a later section.

To process a different file, e.g. data/multi_period.tar.gz, in the 02_hello_nextflow.nf script we would run:

BASH

nextflow run 02_hello_nextflow.nf --input 'data/multi_period.tar.gz'

OUTPUT

 N E X T F L O W   ~  version 24.10.4

Launching `02_hello_nextflow.nf` [loving_brenner] DSL2 - revision: 8a3d1bb9c7

executor >  local (1)
executor >  local (1)
[49/214249] process > GENERATE_READS (1) [100%] 1 of 1 ✔
[/workspaces/training/sgsss-workflow/scripts/work/49/21424945038a3a509a67cf9d092711/school123.RDS,
/workspaces/training/sgsss-workflow/scripts/work/49/21424945038a3a509a67cf9d092711/school124.RDS,
/workspaces/training/sgsss-workflow/scripts/work/49/21424945038a3a509a67cf9d092711/school125.RDS,
/workspaces/training/sgsss-workflow/scripts/work/49/21424945038a3a509a67cf9d092711/school126.RDS]

We can also use wild cards to specify multiple input files (This will be covered in the channels episode). In the example below we use the * to match any sequence of characters before data/multi_period.tar.gz. Note: If you use wild card characters on the command line you must enclose the value in quotes.

Task 3.2

BASH

nextflow run 02_hello_nextflow.nf --input 'data/*multi_period.tar.gz'

This runs the process GENERATE_DAT twice, once for each file it matches.

OUTPUT

 N E X T F L O W   ~  version 24.10.4

Launching `02_hello_nextflow.nf` [grave_hopper] DSL2 - revision: 8a3d1bb9c7

executor >  local (2)
[5f/7df89f] process > GENERATE_READS (2) [100%] 2 of 2 ✔
[/workspaces/training/sgsss-workflow/scripts/work/df/253fc08b9b2941144e0e67c8e3c213/school123.dat,
/workspaces/training/sgsss-workflow/scripts/work/df/253fc08b9b2941144e0e67c8e3c213/school124.dat,
/workspaces/training/sgsss-workflow/scripts/work/df/253fc08b9b2941144e0e67c8e3c213/school125.dat,
/workspaces/training/sgsss-workflow/scripts/work/df/253fc08b9b2941144e0e67c8e3c213/school126.dat]

[/workspaces/training/sgsss-workflow/scripts/work/5f/7df89f35cb0de22fe9eb6c91e833ed/school123.RDS,
/workspaces/training/sgsss-workflow/scripts/work/5f/7df89f35cb0de22fe9eb6c91e833ed/school124.RDS,
/workspaces/training/sgsss-workflow/scripts/work/5f/7df89f35cb0de22fe9eb6c91e833ed/school125.RDS,
/workspaces/training/sgsss-workflow/scripts/work/5f/7df89f35cb0de22fe9eb6c91e833ed/school126.RDS]

Task 3.3

Re-run the Nextflow script 02_hello_nextflow.nf by changing the pipeline input to all files in the directory that end with each_period.tar.gz:

BASH

nextflow run 02_hello_nextflow.nf --input 'data/*each_period.tar.gz'

The string specified on the command line will override the default value of the parameter in the script. The output will look like this:

OUTPUT


 N E X T F L O W   ~  version 24.10.4

Launching `02_hello_nextflow.nf` [lethal_cajal] DSL2 - revision: 8a3d1bb9c7

executor >  local (2)
[05/8e0aa0] process > GENERATE_READS (1) [100%] 2 of 2 ✔
[/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school123_period1.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school123_period2.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school124_period1.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school124_period2.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school125_period1.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school125_period2.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school126_period1.dat,
/workspaces/training/sgsss-workflow/scripts/work/05/8e0aa09cc3795d1a3fc2ed1384adf7/school126_period2.dat]

executor >  local (2)
[05/8e0aa0] process > GENERATE_READS (1) [100%] 2 of 2 ✔

[/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school123_period1.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school123_period2.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school124_period1.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school124_period2.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school125_period1.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school125_period2.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school126_period1.RDS,
/workspaces/training/sgsss-workflow/scripts/work/07/303a7d7f5a8a582db4d9df86d68a08/school126_period2.RDS]

Parameter File


If we have many parameters to pass to a script it is best to create a parameters file. The convention is for the file to be placed on the top level of our workflow folder and for this to be named params.config.

Task 3.4

We have created a parameter file params.config for the workflow. Based on the intended parameter definition, what implicit Nextflow variables could we use as part of the defintion? Notice we want to rename our params.input to params.school_data to make our script more specific and clear.

OUTPUT


batches                     : 1
model specification         : /workspaces/training/sgsss-workflow/scripts/params/meta.csv
school data                 : /workspaces/training/sgsss-workflow/scripts/data/each_period.tar.gz
school info                 : /workspaces/training/sgsss-workflow/scripts/params/school_info.json
composition data            : /workspaces/training/sgsss-workflow/scripts/data/composition_each_period.tar.gz
effects                     : /workspaces/training/sgsss-workflow/scripts/params/effects.csv
subgroup                    : /workspaces/training/sgsss-workflow/scripts/params/subgroup.csv

Here we use the nextflow environment variable baseDir which is resolved by the workflow at runtime. The path to the data and params folder is specified through use of relative file paths. Open the params.config file to inspect the following:

GROOVY

// params.config
params {
                outdir = "${baseDir}/results"
                batches = 1
                meta = "${baseDir}/params/meta.csv"
                effects = "${baseDir}/params/effects.csv"
                subgroup = "${baseDir}/params/subgroup.csv"
                school_data = "${baseDir}/data/each_period.tar.gz"
                school_info = "${baseDir}/params/school_info.json"
                composition_data = "${baseDir}/data/composition_each_period.tar.gz"
}

To point Nextflow to this params.config file, we include the following code includeConfig "params.config", in our workflow configuration file.nextflow.config.

Open the 03_params.nf and check the syntax, notice we abstract the parameterisation of the workflow from the workflow definition. This means we no longer need to define a parameter in our main workflow file, so long as we point nextflow to the params.config file.

GROOVY


nextflow run 03_params.nf 

Key Points

  • Pipeline parameters are specified by prepending the prefix params to a variable name, separated by a dot character.
  • To specify a pipeline parameter on the command line for a Nextflow run use --variable_name syntax.