Summary and Setup
Computational social science (CSS) brings computational approaches to social science questions. Nextflow is a workflow management software which enables the writing of scalable and reproducible scientific workflows. With this half day workshop we will motivate the use of this tool in operationalising reproducible social science research.
This is a student led introductory lesson to computational workflows. No previous knowledge of Nextflow, or other workflow software is required.
Checklist
Optional
It is helpful to be familiar with using a programming language, to the level of Plotting and Programming in Python or R for Reproducible Scientific Analysis, although this lesson does not specifically rely on Python or R. A full set of recommended courses and resources you can explore is covered in Software Carpentry Lessons.
The workshop offers an overview to Nextflow. Nextflow integrates various software package and environment management systems such as Docker, Singularity, and Conda. It allows for existing pipelines written in common scripting languages, such as R and Python, to be seamlessly coupled together. It simplifies the implementation and running of workflows on cloud or high-performance computing (HPC) infrastructure.
Explore the Material

Schedule
Expand the callout button below to explore the schedule for the workshop.
Section | Time | Topics Covered |
---|---|---|
1. Introduction | 00h 25m | What are the FAIR research principles? How do FAIR principles apply to software? How does folder organisation help me? |
2. Hello Nextflow | 00h 50m | What is Nextflow? Why should I use a workflow management system? What are the features of Nextflow? What are the main components of a Nextflow script? How do I run a Nextflow script? |
Break | 10m | |
3. Parameters | 01h 00m | How can I change the data a workflow uses? How can I parameterise a workflow? How can I add my parameters to a file? |
4. Channels | 01h 40m | How do I move data around in Nextflow? How do I handle different types of input, e.g. files and parameters? How can I use pattern matching to select input files? |
Break | 10m | |
5. Modules | 02h 00m | How do I run tasks/modules in Nextflow? How do I get data, files and values, into a module? |
Finish Introductory Material | 02h 20m | |
6. Modules Part 2 | optional | How do I get data, files, and values, out of processes? How do I handle grouped input and output? How can I control when a process is implemented? How do I control resources, such as number of CPUs and memory, available to processes? How do I save output/results from a process? |
7. Workflow | optional | How do I connect channels and processes to create a workflow? How do I invoke a process inside a workflow? |
8. Operators | optional | How do I perform operations, such as filtering, on channels? What are the different kinds of operations I can perform on channels? How do I combine operations? How can I use a CSV file to process data into a Channel? |
9. Reporting | optional | How do I get information about my pipeline run? How can I see what commands I ran? How can I create a report from my run? |
10. Nextflow configuration | optional | How do I configure a Nextflow workflow? How do I assign different resources to different processes? How do I separate and provide configuration for different computational systems? |
11. Auxiliary Tools | optional | When should I use a pre-built container? How can I customise a container? What is a remote codespace? |
12. Resuming a Workflow | optional | How can I restart a Nextflow workflow after an error? How can I add new data to a workflow without starting from the beginning? Where can I find intermediate data and results? |
13. Portability of Workflow | optional | How can I move my analysis to a computer cluster? |
The workshop offers an overview to Nextflow. Nextflow integrates various software package and environment management systems such as Docker, Singularity, and Conda. It allows for existing pipelines written in common scripting languages, such as R and Python, to be seamlessly coupled together. It simplifies the implementation and running of workflows on cloud or high-performance computing (HPC) infrastructure.
Set-up Material
To follow along the practical component it is recommended use GitHub Codespaces. This will require a stable internet connection. If you are not signed in to GitHub, you may be prompted to do so, once you open the material in GitHub Codespaces.
Online Learning Environment
GitHub Codespaces is a cloud development environment for teams to develop software efficiently and securely. We use it as a training environment because it allows us to work in a consistent and thoroughly tested environment. It requires connection to Internet and can be accessed through your web browser.
You can create a free GitHub account from the GitHub home page. You can upgrade your GitHub account to an Education account from the GitHub Education home page using your affiliate/student email.
Running GitHub Codespaces
You can click on the button shown below from the many pages in the
training portal where it is displayed.
Once you are logged in to GitHub, you can open this link in your browser to open the training environment: https://codespaces.new/nextflow-io/training?quickstart=1&ref=master.
You should be presented with a page where you can create a new GitHub Codespace. You can click “Change options” to configure the machine used.
Using a machine with more cores allows you to take greater advantage of Nextflow’s ability to parallelize workflow execution.
For the hands-on component, we recommend using a 4-core machine.
The free GitHub plan includes 120 core-hours of Codespaces compute per month, which amounts to 30 hours of a 4-core machine. Opening a new GitHub Codespaces environment for the first time can take several minutes.
Explore GitHub Codespaces
After GitHub Codespaces has loaded, you should see the welcome page:

This is the interface of the VSCode IDE, a popular code development application that we recommend using for Nextflow development.
- The sidebar allows you to customize your GitHub Codespace environment and perform basic tasks (copy, paste, open files, search, git, etc.). You can click the explorer button to see which files are in this repository.
-
The terminal allows you to run all the programs in
the repository. For example, both
nextflow
anddocker
are installed and can be executed. - The file explorer allows you to view and edit files. Clicking on a file in the explorer will open it within the main window.
-
The main editor showing you a preview of the
README.md
file. When you open code or data files, they will open there.
Reopening a GitHub Codespaces session
Once you have created an environment, you can easily resume or restart it and continue from where you left off. Your environment will time out after 30 minutes of inactivity and will save your changes for up to 2 weeks.
You can reopen an environment from https://github.com/codespaces/.
Previous environments will be listed. You can manage these sessions by freezing or removing previous sessions. For the moment you can click a session to resume it, just be mindful of your usage. If you have saved the URL for your previous GitHub Codespaces environment, you can simply open it in your browser.
Alternatively, click the same button that you used to create it in the first place:
You should see the previous session, the default option is to resume it:

Saving files from GitHub Codespaces to your local machine
To save any file from the explorer panel, right-click the file and
select Download
.
GitHub Codespaces quotas
GitHub Codespaces gives you up to 15 GB-month storage per month, and 120 core-hours per month. This is equivalent to around 60 hours of the default environment runtime using the standard workspace (up to 2 cores, 8 GB RAM, and 32 GB storage).
GitHub Codespaces environments are configurable. You can create them with more resources, but this will consume your free usage faster and you will have fewer hours of access to this space.
More information can be found in the GitHub docs: About billing for GitHub Codespaces