# Running Jobs With Modules Software modules are provided for commonly used software packages. #### Modules Quickstart As an example, we provide a module for [Huggingface's Transformer Reinforcement Learning](https://github.com/huggingface/trl/tree/v0.28.0) package. So getting a working TRL environment is as easy as `module load trl`. We also provide a sample sbatch script that uses the TRL module `/opt/examples/libexec/trl-module.sbatch` : {% code title="/opt/examples/libexec/trl-module.sbatch" lineNumbers="true" expandable="true" %} ```bash #!/usr/bin/bash #SBATCH --job-name=trl-finetuner #SBATCH --output=jid-%j.name-%x.log #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=48 #SBATCH --gpus-per-node=8 #SBATCH --time=01:00:00 #SBATCH --nodes=4 set -exuo pipefail module load trl TRLFT_PY="/opt/examples/scripts/trl_tune/trl_tune.py" GPUS_PER_NODE=8 MASTER_ADDR=$(hostname) MASTER_PORT=6000 if [[ -d $HOME/.cache/huggingface/datasets/mlech26l___shell-helper ]]; then HF_OFFLINE=1 else HF_OFFLINE=0 fi srun bash < /path/to/script.sbatch`. {% code expandable="true" %} ``` $ sbatch /opt/examples/libexec/trl-module.sbatch Submitted batch job 268 $ tail -f jid-268.name-trl-finetuner.log ++ hostname + MASTER_ADDR=tus1-p13-g2 + MASTER_PORT=6000 + [[ -d /home/tensorwave/.cache/huggingface/datasets/mlech26l___shell-helper ]] + HF_OFFLINE=1 + srun bash MY IP is tus1-p13-g2 and am I host? 1 and what is master addr tus1-p13-g2 MY IP is tus1-p14-g24 and am I host? 0 and what is master addr tus1-p13-g2 MY IP is tus1-p14-g37 and am I host? 0 and what is master addr tus1-p13-g2 MY IP is tus1-p16-g17 and am I host? 0 and what is master addr tus1-p13-g2 Rank: 0 out of 32 Number of gpus available: 8 GPU 0: AMD Instinct MI325X GPU 1: AMD Instinct MI325X GPU 2: AMD Instinct MI325X GPU 3: AMD Instinct MI325X GPU 4: AMD Instinct MI325X GPU 5: AMD Instinct MI325X GPU 6: AMD Instinct MI325X GPU 7: AMD Instinct MI325X Loading model: LiquidAi/LFM2.5-1.2B-Instruct ... Found the latest cached dataset configuration 'default' at /home/tensorwave/.cache/huggingface/datasets/mlech26l___shell-helper/default/0.0.0/bf4e04b465240544350f49c89cd108c35698f588 (last modified on Sat Feb 14 08:59:42 2026). Launching training {'loss': '1.294', 'grad_norm': '0.9996', 'learning_rate': '1.85e-05', 'entropy': '1.288', 'num_tokens': '8.023e+06', 'mean_token_accuracy': '0.7068', 'epoch': '0.3067'} {'loss': '1.294', 'grad_norm': '0.9996', 'learning_rate': '1.85e-05', 'entropy': '1.288', 'num_tokens': '8.023e+06', 'mean_token_accuracy': '0.7068', 'epoch': '0.3067'} {'loss': '1.294', 'grad_norm': '0.9996', 'learning_rate': '1.85e-05', 'entropy': '1.288', 'num_tokens': '8.023e+06', 'mean_token_accuracy': '0.7068', 'epoch': '0.3067'} {'loss': '1.294', 'grad_norm': '0.9996', 'learning_rate': '1.85e-05', 'entropy': '1.288', 'num_tokens': '8.023e+06', 'mean_token_accuracy': '0.7068', 'epoch': '0.3067'} {'loss': '0.9138', 'grad_norm': '0.837', 'learning_rate': '1.696e-05', 'entropy': '0.9295', 'num_tokens': '1.604e+07', 'mean_token_accuracy': '0.7676', 'epoch': '0.6135'} {'loss': '0.9138', 'grad_norm': '0.837', 'learning_rate': '1.696e-05', 'entropy': '0.9295', 'num_tokens': '1.604e+07', 'mean_token_accuracy': '0.7676', 'epoch': '0.6135'} {'loss': '0.9138', 'grad_norm': '0.837', 'learning_rate': '1.696e-05', 'entropy': '0.9295', 'num_tokens': '1.604e+07', 'mean_token_accuracy': '0.7676', 'epoch': '0.6135'} {'loss': '0.9138', 'grad_norm': '0.837', 'learning_rate': '1.696e-05', 'entropy': '0.9295', 'num_tokens': '1.604e+07', 'mean_token_accuracy': '0.7676', 'epoch': '0.6135'}k ``` {% endcode %} #### Module Management To explore available modules, run `module avail` or `module spider`. Avail gives a simplified output, while spider is more detailed and useful for sorting out module dependencies. If you need a specific piece of software, contact us and we can provide a module that fits your needs. #### Resources Lmod user guide: --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.tensorwave.com/slurm/running-jobs-with-modules.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.