Auxiliary Tools

Last updated on 2025-06-28 | Edit this page

Estimated time: 0 minutes

Overview

Questions

  • When should I use a pre-built container?
  • How can I customise a container?
  • What is a remote codespace?

Objectives

  • Understand how to reproduce code.
  • Understand the benefits of containers.

Docker Hub


By sharing a container, you create a portable and replicable research environment that can be easily accessed and used by other researchers. This process not only facilitates collaboration but also ensures that your work is preserved in an environment where it can be run without compatibility issues, i.e you can do your best to ‘future-proof’ your research.

To share your code and software, you’ll use Docker Hub. Docker hub is a cloud-based registry service that lets you share and distribute container images.

There is a sea of containers out there, it is not necessarily safe to use a docker container, as there is always risk of malware. The following involves guidance on best practice:

  • The container image is updated regularly, the latest version should be available alongside previous versions.
  • There is a Dockerfile or other listing of what has been installed to the container image.
  • The container image page has documentation on how to use the container image.

Discussion

If a container image is never updated, and does not have a lot of metadata, it is probably worth skipping over. Even if such a container image is secure, it is not reproducible and not a dependable way to run research computations.

Docker Recipe File

Much like a cookbook, you can pull out recipes and alter to own preference. This is how you normally get started building your own image, you can start with a base repository.

Task 11.1

In this case we use a base image for R on a Linux machine, from bioconductor. We layer requirements, i.e code libraries.

To do this we evaluate the command install.packages() using R. This is possible as we work within the docker container which has already installed R. We install packages directly from CRAN, in this case the recipe file could be improved on by requesting exact versions for packages.

We also demonstrate installing RSiena version 1.4.19 from source code. Note as we build the container, we realise this is a self-contained enviroment and so need to manage file paths the same way we would with a folder that takes up its own space on our directory. To do this we copy the source code into the top level of our container and then use option install.packages(...,repos = NULL, type= 'source'). The next steps involve pushing the local container onto docker hub, under the name, siena_r and a version tag number. As this is an iterative process, the version tag number we are working with here follows the semi-colon siena_r:0.8. In some cases you may require a specific version of a container; however, the most recent version can also be requested with siena_r:latest.


FROM bioconductor/bioconductor_docker:devel-R-4.4.1

RUN R -e "install.packages(c('Matrix', 'lattice', 'parallel', 'MASS', 'methods', 'xtable', 'network', 'vioplot', 'sna', 'codetools', 'dplyr', 'metafor', 'argparse', 'stringr', 'mixmeta'), repos = c(CRAN = 'https://cloud.r-project.org'))"

COPY rsiena_1.4.19.tar.gz .

RUN R -e "install.packages('rsiena_1.4.19.tar.gz', repos = NULL, type = 'source')"

Git

Task 11.2

Arrows indicate users making changes and synchronizing their work with the repository.
Arrows indicate users making changes and synchronizing their work with the repository.

BASH


git clone --branch ready-set-workflow --single-branch https://github.com/omiridoue/sgsss-workflow.git

Containers in the workflow

Within our workflow we can specify the container we want to use, as a matter of fact we can specify a container for different processes - the possibility is endless! Say for example you would like to write interoperable code and use Python for one part of your analysis and R for another part, this is possible by defining a different container for each process. Another option is to build one container with all the software (i.e R and Python) installed.

Say we work with one container, but would like to make sure the pipeline is portable. In this case we work with profiles, which is another layer for customisation.

Alternative Platforms for compute clusters

Many container platforms are available, but Apptainer is designed for ease-of-use on shared systems and in high performance computing (HPC) environments. Nextflow can build an immutable image based off a Docker recipe file.

Building an Apptainer Image

BASH

singularity pull docker://omiridoue-siena_r:0.8

Workflow Definition

Within our workflow, we can declare a process container, and ensure we enable apptainer. Again we don’t want to hard code this decision as we’d like to keep options as flexible as possible. This is why we build a profile for each our compute environments, in this case for our local machine / GitHub codespace we have access to Docker. However, for our slurm profile, relevant to a computer cluster with Slurm workload manager, we opt for apptainer (former singularity), as docker is not available.

Task 11.3

We can declare a different config file for different compute environments, or profiles. These profiles are stored under the conf sub-folder.

GROOVY


// nextflow.config

profiles {

  local {
    includeConfig 'conf/local.config'
    process.container = 'omiridoue/siena_r:0.8'
  }
  slurm {
    includeConfig 'conf/slurm.config'
    process.executor = 'slurm'
    process.container = 'omiridoue/siena_r:0.8'
  }
}

Key Points

  • The Docker Hub is an online repository of container images.
  • Find a container recipe file that works for your project and customise this.
  • Nextflow can pull a docker container from Docker Hub and convert this to an Apptainer image.
  • Docker is not permitted on most HPC environments, apptainer sif files are used instead.
  • Containers are important to reproducible workflows and portability of workflows across environments.