For the complete documentation index, see llms.txt. This page is also available as Markdown.

Running Containerized Jobs with Apptainer

  • apptainer pull docker://<repo>:<tag> to pull and cache an image from Docker Hub

    • Images are cached under $HOME/.apptainer, so images pulled on a login pod are available on the worker nodes

  • Due to login pod security settings, Apptainer build/exec/shell can only be run on the worker pods

Running Your First Containerized Job

In this example, you’ll run a PyTorch matmul test using Apptainer. This will demonstrate how to pull images from Docker Hub, show how to mount your code into the container, and verify that your containerized environment can access full GPU performance.

  1. Create a Python script to measure PyTorch's matrix multiplication performance. Copy the following code block into a file named torch_matmul.py.

  2. Create a new job script named torch-matmul-apptainer.sbatch:

    torch-matmul-pyxis.sbatch
    #!/bin/bash
    #SBATCH --job-name=torch_matmul-apptainer
    #SBATCH --output=jid-%j.name-%x.log
    #SBATCH --gpus-per-node=8
    #SBATCH -N1
    
    # Script created in step 1.
    MATMUL_PY="$PWD/torch_matmul.py" 
    # pytorch-rocm image from Docker Hub, published by AMD
    CONTAINER_IMAGE='rocm/pytorch:rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1'
    CONTAINER_SAVE="./rocm+pytorch+rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1.sif"
    
    # pull the image from Docker Hub and save to disk
    apptainer pull "$CONTAINER_SAVE" "docker://$CONTAINER_IMAGE"
    
    # Mount torch_matmul.py into the container and run the benchmark
    srun apptainer exec "$CONTAINER_SAVE" \
      /opt/venv/bin/python "$MATMUL_PY"
    
  3. Submit the job. Here's an example run:

    $ sbatch torch-matmul-apptainer.sbatch
    Submitted batch job 4241
    $ tail -f jid-4241.name-torch_matmul-apptainer.log
    INFO:    Converting OCI blobs to SIF format
    INFO:    Starting build...
    INFO:    Fetching OCI image...
    INFO:    Extracting OCI image...
    2026/05/05 21:06:48  warn rootless{usr/lib/x86_64-linux-gnu/gstreamer1.0/gstreamer-1.0/gst-ptp-helper} ignoring (usually) harmless EPERM on setxattr "security.capability"
    INFO:    Inserting Apptainer configuration...
    INFO:    Creating SIF file...
    Device: AMD Instinct MI325X
    n= 1024  145.11 TFLOPs
    n= 2048  448.87 TFLOPs
    n= 4096  630.48 TFLOPs
    n= 8192  753.88 TFLOPs

To highlight some of the key features of the torch-matmul-apptainer.sbatch file:

  • On line 14, the rocm-pytorch image is pulled from Docker Hub and saved to disk as a .sif file.

    • docker:// is prepended to the image to tell Apptainer to pull the image from Docker Hub. Other container repositories can be configured and run apptainer pull --help for details.

    • Pulling images before running parallel jobs is highly recommended to avoid a thundering herd problem pulling remote images.

  • The saved container is launched with apptainer exec on line 17.

    • By default, the /home directory is mounted into the container. So if torch_matmul.py is saved somewhere in the home directory, it'll be available in the container.

Useful apptainer commands

Command
Description

apptainer exec <image> <command>

Execute 'command' in the container.

apptainer shell <image>

Open a shell in the container. This is useful for interactive sessions, srun --gpus=8 --pty apptainer shell <iamge>

apptainer pull [output file] <URI>

Save an image locally as an .sif file.

apptainer build <sandbox|default> <spec-file>

Build an apptainer image from a .spec file. Useful for injecting modifications into a docker image.

--bind src[:dst[:opts]]

Mount files into the container.

--fakeroot

By default, containers are launched in userland. This flag launches the container as root. It's useful for images that are expected to run as root.

Last updated