For the complete documentation index, see llms.txt. This page is also available as Markdown.

Containers

The cluster supports several ways to run containerized workloads including Apptainer, Pyxis/Enroot, and Docker.

The recommended approach to run containers is Apptainer. Since it was designed for HPC/Slurm environments, it has several benefits an is signifigantly easier to use. It integrates cleanly with srun across multiple nodes, seamlessly uses the shared /home directory to cache images, and it runs containers as the submitting user (avoiding issues that Docker's privileged daemon introduces).

For software that does not require a container, see Modules for Lmod and SHPC. Python virtual environments (.venv) installed under /home are also a straightforward option, since /home is mounted consistently across login and compute pods.


Apptainer

Apptainer (formerly Singularity) runs containers as the submitting user. Images can be pulled directly from a Docker registry at runtime or pre-built as .sif files for faster starts and multi-node use.

Registry login

Public images (such as rocm/ on Docker Hub) can be pulled without authentication. Private registries require logging in first:

apptainer registry login --username <username> docker://docker.io

You will be prompted for your password or access token. To supply credentials non-interactively:

echo '<token>' | apptainer registry login --username <username> --password-stdin docker://docker.io

Many cloud OCI registries use token-based authentication. In that case, pass the token as the password; a username is still required. Consult your provider's documentation for their specific login requirements. See the Apptainer registry login documentation for all options.

Credentials are stored under your home directory and apply to subsequent apptainer pull, apptainer exec, and apptainer shell calls that reference the registry. To remove stored credentials:

apptainer registry logout docker://docker.io

Pulling an image

Pull an image from a Docker registry and save it as a local .sif file. Running from a login pod is fine for this step since it does not require a GPU allocation:

apptainer pull rocm-pytorch.sif docker://rocm/pytorch:rocm7.2.2_ubuntu22.04_py3.10_pytorch_release_2.10.0

The resulting .sif file can be used in any subsequent apptainer exec or apptainer shell call and starts faster than pulling the docker:// URI at runtime. Store it on /home so it is accessible from compute pods.

Single-node interactive

You can also pass a docker:// URI directly without pulling first:

Note: Apptainer will passthrough-mount the /home/$USER and /tmp directories. This can be dissabled with --contain or --no-home flags.

Batch job (single node)

For single-node jobs, Apptainer can pull the image at runtime:

Multi-node jobs: building a BNXT-enabled SIF

For multi-node jobs, the container image must include the correct network software for the cluster's NICs. This can either be built into the image, or passed through using Apptainer's CDI interface. See Installing Network Software in Container images for details.


Pyxis

Pyxis is a SPANK plugin that integrates OCI container execution directly into srun via flags. It uses Enroot under the hood to manage squashfs-format images (.sqsh).

Pulling and caching a container

Images can be pulled from public repos using the --container-image flag.

The --container-name flag caches the image as a named Enroot container. Subsequent srun steps using the same name skip the pull and start much faster. Because the container is writable (--container-writable), any modifications made during one step are preserved across subsequent steps that reference the same named container. To persist the container to disk, the --container-save=PATH flag can be used, this saves the container state as a .sif file and can be reused in future jobs.

Running a job with a named container

Example Pyxis scripts are available at /opt/tw/examples/libexec/*pyxis*.sbatch.

Enroot as an Escape Hatch

Under the hood, Pyxis uses Enroot as a containerization engine. Some opertions (like pulling an image from a private repo) require using the enroot cli tool.

This will leave a *.sqsh file in your current working directory, which can be passed to --container-image in future slurm jobs.


Docker

Warning: Docker on worker nodes runs as root via a privileged daemon, and it is recommended to use Apptainer or Pyxis instead.

Docker is available on worker nodes. You can use it to run containers, build images, or pull from a registry within a job allocation.

Running a container

Building an image

If you need to build a custom image during a job, allocate a node and run the build from there:

Using a Docker image with Apptainer

Docker images can be consumed directly by Apptainer without running the Docker daemon at all, using the docker:// URI:

This is the preferred pattern for job submission since it runs as the submitting user and integrates with Slurm resource accounting.


Command Comparison

Operation
Apptainer
Pyxis (srun flags)
Docker

Launch a batch job

srun apptainer exec <img> <cmd>

srun --container-image=<img> <cmd>

srun docker run --rm <img> <cmd>

Get an interactive shell

srun --pty apptainer shell <img>

srun --pty --container-image=<img> bash

srun --pty docker run -it --rm <img> --entrypoint /bin/bash

Download an image to Disk

apptainer pull <dst>.sif docker://<img>

srun --container-image=<img> --container-save=<name>.sqsh true

(closest equivilent) docker pull <img>

Volume mount

--bind <src>[:<dst>]

--container-mounts=<src>:<dst>

-v <src>:<dst>

Last updated