# Custom Prolog and Epilog scripts

Slurm's prolog and epilog features allow users to specify scripts to run on each node before the first job step/at job termination. This enables admins to perform tasks like system monitoring or cleanup in a non-disruptive way at fairly frequent intervals.

TensorWave's managed Slurm solution provides `/mnt/customer/prolog.d` and `/mnt/customer/epilog.d` for cluster admins to add their custom prolog/epilog scripts, respectively. Per-node summaries of custom prolog/epilog script runs are logged in `/mnt/customer/logs`.&#x20;

Prolog/Epilog scripts are a powerful tool for managing a cluster, but there are also a few easy ways to 'shoot yourself in the foot' with them. If a Prolog/Epilog script returns a non-zero exit code, the node will be placed in DRAIN state, so if a buggy script is deployed, it can bring down the entire cluster. Prolog/Epilog scripts are run as the root user, this provides broad access for system monitoring, but can also enable disrupting running jobs if performing cleanup tasks.

#### Test example of a custom prolog

In this example, we have a prolog and an epilog script. Both scripts print an output to stdout. The epilog script returns 1 to simulate a failure event.

```bash
tensorwave@tensorwave.com@slurm-login-skip-849dbcf5c-q7ffr:~$ sudo cat /mnt/customer/prolog.d/test-prolog-1.sh
#!/usr/bin/env bash
echo "Hello from job id $SLURM_JOB_ID on node $SLURM_NODENAME"

tensorwave@tensorwave.com@slurm-login-skip-849dbcf5c-q7ffr:~$ sudo cat /mnt/customer/epilog.d/test-epilog-1.sh
#!/usr/bin/env bash
echo "Goodbye from job id $SLURM_JOB_ID on node $SLURM_NODENAME"
exit 1
```

To trigger the prolog/epilog, we submit an `srun` job. Since the `test-epilog-1.sh` 'fails', the node our job ran on drains.

```bash
tensorwave@tensorwave.com@slurm-login-skip-849dbcf5c-q7ffr:~$ srun -N 1 --gpus-per-node=8 hostname
tus1-p2-g6

tensorwave@tensorwave.com@slurm-login-skip-849dbcf5c-q7ffr:~$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
gpuworker*    up   infinite      1  drain tus1-p2-g6
gpuworker*    up   infinite      1   idle tus1-p2-g5
```

Investigating logs, the prolog runs successfully, but since the epilog failed, the full output is saved in the logs.&#x20;

```bash
tensorwave@tensorwave.com@slurm-login-skip-849dbcf5c-q7ffr:~$ cat /mnt/customer/logs/prolog/tus1-p2-g6.log
timestamp=2026-04-01T18:43:39Z script=/mnt/customer/prolog.d/test-prolog-1.sh exit_code=0 job_id=5995 job_user=tensorwave@tensorwave.com

tensorwave@tensorwave.com@slurm-login-skip-849dbcf5c-q7ffr:~$ cat /mnt/customer/logs/epilog/tus1-p2-g6.log
timestamp=2026-04-01T18:43:42Z script=/mnt/customer/epilog.d/test-epilog-1.sh exit_code=1 job_id=5995 job_user=tensorwave@tensorwave.com
--- output ---
Goodbye from job id 5995 on node
```

#### Refrences

Slurm Prolog and Epilog Guide: <https://slurm.schedmd.com/prolog_epilog.html>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorwave.com/slurm/custom-prolog-and-epilog-scripts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.