# Running Containerized Jobs with Apptainer

* `apptainer pull docker://<repo>:<tag>` to pull and cache an image from Docker Hub
  * Images are cached under `$HOME/.apptainer`, so images pulled on a login pod are available on the worker nodes
* Due to login pod security settings, Apptainer build/exec/shell can only be run on the worker pods

#### **Running Your First Containerized Job**&#x20;

In this example, you’ll run a **PyTorch matmul test** using Apptainer.\
This will demonstrate how to pull images from Docker Hub, show how to mount your code into the container, and verify that your containerized environment can access full GPU performance.

1. Create a Python script to measure PyTorch's matrix multiplication performance. Copy the following code block into a file named `torch_matmul.py`.

   <pre class="language-python" data-title="torch_matmul.py" data-line-numbers data-expandable="true"><code class="lang-python">import torch

   device = torch.device("cuda:0")
   dtype = torch.float16
   torch.set_default_device(device)

   print(f"Device: {torch.cuda.get_device_name(device)}")

   sizes = [1024, 2048, 4096, 8192]
   iters = 50

   for n in sizes:
       a = torch.randn((n, n), dtype=dtype)
       b = torch.randn((n, n), dtype=dtype)

       start = torch.cuda.Event(enable_timing=True)
       end = torch.cuda.Event(enable_timing=True)

       # warmup
       for _ in range(2):
           torch.matmul(a, b)
       torch.cuda.synchronize()

       start.record()
       for _ in range(iters):
           c = torch.matmul(a, b)
       end.record()
       torch.cuda.synchronize()

       elapsed_ms = start.elapsed_time(end)
       elapsed_s = elapsed_ms / 1e3

       # FLOPs for matmul ≈ 2 * n^3
       total_flops = 2 * n**3 * iters
       tflops = total_flops / elapsed_s / 1e12

       print(f"n={n:5d}  {tflops:6.2f} TFLOPs")

   </code></pre>
2. **Create a new job script** named `torch-matmul-apptainer.sbatch`:

   <pre class="language-bash" data-title="torch-matmul-pyxis.sbatch" data-line-numbers><code class="lang-bash">#!/bin/bash
   #SBATCH --job-name=torch_matmul-apptainer
   #SBATCH --output=jid-%j.name-%x.log
   #SBATCH --gpus-per-node=8
   #SBATCH -N1

   # Script created in step 1.
   MATMUL_PY="$PWD/torch_matmul.py" 
   # pytorch-rocm image from Docker Hub, published by AMD
   CONTAINER_IMAGE='rocm/pytorch:rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1'
   CONTAINER_SAVE="./rocm+pytorch+rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1.sif"

   # pull the image from Docker Hub and save to disk
   apptainer pull "$CONTAINER_SAVE" "docker://$CONTAINER_IMAGE"

   # Mount torch_matmul.py into the container and run the benchmark
   srun apptainer exec "$CONTAINER_SAVE" \
     /opt/venv/bin/python "$MATMUL_PY"

   </code></pre>
3. **Submit the job.** Here's an example run:

   ```shellscript
   $ sbatch torch-matmul-apptainer.sbatch
   Submitted batch job 4241
   $ tail -f jid-4241.name-torch_matmul-apptainer.log
   INFO:    Converting OCI blobs to SIF format
   INFO:    Starting build...
   INFO:    Fetching OCI image...
   INFO:    Extracting OCI image...
   2026/05/05 21:06:48  warn rootless{usr/lib/x86_64-linux-gnu/gstreamer1.0/gstreamer-1.0/gst-ptp-helper} ignoring (usually) harmless EPERM on setxattr "security.capability"
   INFO:    Inserting Apptainer configuration...
   INFO:    Creating SIF file...
   Device: AMD Instinct MI325X
   n= 1024  145.11 TFLOPs
   n= 2048  448.87 TFLOPs
   n= 4096  630.48 TFLOPs
   n= 8192  753.88 TFLOPs
   ```

To highlight some of the key features of the `torch-matmul-apptainer.sbatch` file:

* On line 14, the rocm-pytorch image is pulled from Docker Hub and saved to disk as a `.sif` file.
  * `docker://` is prepended to the image to tell Apptainer to pull the image from Docker Hub. Other container repositories can be configured and run `apptainer pull --help` for details.
  * Pulling images before running parallel jobs is highly recommended to avoid a [thundering herd problem](https://en.wikipedia.org/wiki/Thundering_herd_problem) pulling remote images.
* The saved container is launched with `apptainer exec` on line 17.
  * By default, the `/home` directory is mounted into the container. So if `torch_matmul.py` is saved somewhere in the home directory, it'll be available in the container.&#x20;

#### Useful apptainer commands

| Command                                          | Description                                                                                                           |
| ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- |
| `apptainer exec <image> <command>`               | Execute 'command' in the container.                                                                                   |
| `apptainer shell <image>`                        | Open a shell in the container. This is useful for interactive sessions, `srun --gpus=8 --pty apptainer shell <iamge>` |
| `apptainer pull [output file] <URI>`             | Save an image locally as an .sif file.                                                                                |
| `apptainer build <sandbox\|default> <spec-file>` | Build an apptainer image from a .spec file. Useful for injecting modifications into a docker image.                   |

|                           |                                                                                                                                                     |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--bind src[:dst[:opts]]` | Mount files into the container.                                                                                                                     |
| `--fakeroot`              | By default, containers are launched in userland. This flag launches the container as root. It's useful for images that are expected to run as root. |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorwave.com/slurm/running-containerized-jobs/running-containerized-jobs-with-apptainer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
