Prolog / Epilog
Slurm supports prolog and epilog scripts that run automatically on each worker pod at the start and end of every job. Prologs run before the first job step begins; epilogs run after the job completes or is cancelled.
Common uses include:
Verifying node health before a job starts
Cleaning up temporary files or resetting state after a job ends
Logging job metadata for monitoring or auditing
Enforcing site-specific policies around GPU, network, or filesystem state
Prolog and epilog scripts run as root on the worker pod, which gives them broad access but also means a failing or buggy script can affect the pod and any jobs running on it. Specifically, if a prolog or epilog script exits with a non-zero code, Slurm will place the node in DRAIN state, taking it out of service. Test scripts carefully before deploying them.
For full background on how Slurm handles prolog and epilog execution, see the Slurm Prolog and Epilog Guide.
Built-in scripts
TensorWave runs a set of managed prolog and epilog scripts on every job automatically. These handle node health checks (see Health Checks), GPU metrics collection for the dashboard, and dispatching your custom scripts. Your scripts always run after the built-in health checks.
Adding custom scripts
Custom prolog and epilog scripts go in the following directories on the shared storage volume:
/mnt/customer/prolog.d/
Before each job step, on every allocated node
/mnt/customer/epilog.d/
After each job completes, on every allocated node
Scripts are executed in lexicographic order by filename. Use numeric prefixes to control ordering:
Requirements
Scripts must be executable (
chmod +x). Non-executable files are skipped with a warning in the log.Scripts must include a shebang on the first line (
#!/usr/bin/env bash).Scripts run as root. A non-zero exit code will drain the node.
Example prolog script
Install it:
Example epilog script
Install it:
Viewing logs
Per-node prolog and epilog logs are written to:
/mnt/customer/logs/prolog/<node>.log
Output from all prolog scripts on that node
/mnt/customer/logs/epilog/<node>.log
Output from all epilog scripts on that node
Each entry includes a timestamp, script path, exit code, job ID, and user. Script output is only captured in the log when the script fails.
Viewing a node's prolog log:
Example output for a successful prolog:
Example output when an epilog script fails (output is included):
If a script fails and the node drains, check the log for the affected node first, then inspect node state with sinfo:
Once the issue is resolved, contact your cluster administrator to resume the node.
Scripts in
/mnt/customer/prolog.dand/mnt/customer/epilog.dare writable by administrators only (chmod 1700). Logs in/mnt/customer/logsare readable by all users.
Last updated

