Running Containerized Jobs in Pyxis
TensorWave Slurm integrates Pyxis, a container runtime plugin for Slurm that enables users to run containerized workloads directly within their jobs.
Running Your First Containerized Job
#!/bin/bash #SBATCH --job-name=torch_matmul-pyxis #SBATCH --output=jid-%j.name-%x.log #SBATCH --gpus-per-node=8 #SBATCH -N1 # Script created in step 1. MATMUL_PY="$PWD/torch_matmul.py" # pytorch-rocm image from Docker Hub, published by AMD CONTAINER_IMAGE='rocm/pytorch:rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1' CONTAINER_NAME="pytorch_matmul_test" # Download the image and instantiate the container srun --container-name=$CONTAINER_NAME --container-image=$CONTAINER_IMAGE true # Mount torch_matmul.py into the container and run the benchmark srun --container-writable \ --container-name=$CONTAINER_NAME \ --container-mounts="$MATMUL_PY:/root/torch_matmul.py" \ /opt/venv/bin/python /root/torch_matmul.py # Save the image to disk for use later srun --container-name=$CONTAINER_NAME \ --container-save=$PWD/torch-matmul.sqsh \ true$ sbatch torch-matmul-pyxis.sbatch Submitted batch job 90 $ tail -f jid-90.name-torch_matmul.log pyxis: importing docker image: rocm/pytorch:rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1 pyxis: imported docker image: rocm/pytorch:rocm7.1.1_ubuntu22.04_py3.10_pytorch_release_2.9.1 Device: AMD Instinct MI325X n= 1024 144.42 TFLOPs n= 2048 466.27 TFLOPs n= 4096 640.01 TFLOPs n= 8192 763.36 TFLOPs pyxis: exported container pyxis_90_pytorch_matmul_test to /home/bkitor@tensorwave.com/snpyxis/torch-matmul.sqsh ^C $ ls jid-90.name-torch_matmul.log torch-matmu-pyxis.sbatch torch-matmul.sqsh torch_matmul.py
Using a Pre-Staged SquashFS Image
Pyxis Flags
Flag
Description
Learn More
Last updated

