> For the complete documentation index, see [llms.txt](https://docs.tensorwave.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tensorwave.com/slurm/job-submission-lua-plugin.md).

# Job Submission Lua Plugin

### Overview

Slurm's job-plugin interface provides a facility for cluster manager to intercept, modify and/or deny user's resource requests. This feature enables functionality like *ensuring jobs have comments for resrouce tracking*, or *setting a default 'timelimit' for interactive jobs*. The plugin behaviour is specified in a lua file (`/mnt/customer/job_submit.lua`) which provides a flexible and powerful interface enabling broad functionality.

For a more background on Slurm's job plugin, see [Slurm's job submit plugin docs](https://slurm.schedmd.com/job_submit_plugins.html), or the [job\_submit\_lua.so source code](https://github.com/SchedMD/slurm/blob/slurm-25.11/src/plugins/job_submit/lua/job_submit_lua.c) for detils on the lua integration.

***

### Script Requirements

The script must be placed in `/mnt/customer/job_submit.lua`, and return a module with functions for `slurm_job_submit` and `slurm_job_modify`. See the 'API' section for function specifications.

If the `job_submit.lua` file is not found, or if the `job_submit.lua` file is missing functions, errors are reported in `/mnt/customer/job_submit_lua.log`.

The lua script is run by the `slurmctld` on the controller pod. **The shared-`/home` directory is not accessible from the slurm-controller**. To generate persistent logs, write to `/mnt/customer`, see the `log()` function in the Example section below.

***

### API

`function slurm_job_submit(job_desc, part_list, submit_uid)`

This function is called when a job is submitted. It can be triggered by `salloc`, `sbatch`, or `srun`.

Arguments:

* `job_desc` requested job allocation
* `part_list` list of partitions the users is authorized to use
* `submit_uid` user ID of requesting user

return values

* `slurm.SUCCESS` on success
* `slurm.ERROR` for generic errors
* `slurm.ESLURM_*` for specific errors (see [slurm/slurm\_errno.h](https://github.com/SchedMD/slurm/blob/slurm-25.11/slurm/slurm_errno.h))

`function slurm_job_modify(job_desc, job_rec, part_list, uid)`

This functions is called when a job-modification request is made.

Arguments:

* `job_desc` specification of rerquested modifications
* `job_ptr` pointer to the job to be modified
* `part_list` list of partitions the users is authorized to use
* `modify_uid` user ID of requesting user

return values

* `slurm.SUCCESS` on success
* `slurm.ERROR` for generic errors
* `slurm.ESLURM_*` for specific errors (see [slurm/slurm\_errno.h](https://github.com/SchedMD/slurm/blob/slurm-25.11/slurm/slurm_errno.h))

***

### Example Plugin

Here's a sample plugin to get up and running quickly. It enforces that all allocation requests eight gpus per node (`--gpus-per-node=8`).

We also provide a simple test-harness which is useful for development/debugging.

```lua
--- Example job_submit.lua for TensorWave's Managed Slurm
--- Denies allocations that don't requests `--gpus-per-node=8`
--- Place this file at /mnt/customer/job_submit.lua and run `scontrol reconfigure`

local function log(msg)
    -- logging helper, reads global var for dest, can be overridden in test-harness 
    -- /home is not available in slurmctld context, need to log to /mnt/customer
    local log_dest = (g_log_dest == nil) and "/mnt/customer/job_submit_lua.log" or g_log_dest
    local file, err = io.open(log_dest, "a")
    if not file then
        slurm.log_error("Failed to open log file: "..tostring(err))
        return
    end
    local timestamp = os.date("%Y-%m-%d %H:%M:%S")
    prefix = string.format("[%s] ", timestamp)
    file:write(prefix .. msg .. "\n")
    file:close()
end

local function eight_gpu_per_node(job_desc)
    -- Validates that --gpus-per-node=8
    if not (job_desc["tres_per_node"] == "gres/gpu:8") then
        return slurm.ESLURM_INVALID_GRES
    end
    return slurm.SUCCESS
end

local function check_job_desc(job_desc, uid)
    local msg = string.format("uid: %d, job name: %s, tres_per_node: %s, gres: %s",
                            uid, job_desc['name'], job_desc['tres_per_node'], job_desc['gres'])
    log("SKIP slurm_job_submit ".. msg)
    return eight_gpu_per_node(job_desc)
end

function slurm_job_submit(job_desc, part_list, uid)
    return check_job_desc(job_desc, uid)
end

function slurm_job_modify(job_desc, job_rec, part_list, uid)
    return check_job_desc(job_desc, uid)
end

-- Export the slurm_job_modiyf/slurm_job_submit functions so the plugin can consume them
return { slurm_job_modify = slurm_job_modify, slurm_job_submit = slurm_job_submit }
```

```lua
--- Test harness for the example job_submit.lua
--- Sets up a dummy Slurm-env, checks functions are exported properly
--- Ensures that only allocations with "gres/gpu:8" are allowed

-- Set globals to emulate the slurm environment
_G.slurm = {
    log_error = print,
    SUCCESS = "SUCCESS",
    ERROR = "ERROR",
    ESLURM_INVALID_GRES = "ESLURM_INVALID_GRES",
}
-- Log to local directory
_G.g_log_dest = "./job_submit_lua.test.log"

-- Load the job script to test
-- package.path = "/mnt/customer/job_submit.lua;" .. package.path
package.path = "./job_submit.lua;" .. package.path
local jsm = require("job_submit.lua")

assert(jsm.slurm_job_submit)
assert(jsm.slurm_job_modify)
assert(type(jsm.slurm_job_submit == 'function'))
assert(type(jsm.slurm_job_modify == 'function'))

tests = {
    gpus_per_node_8 = {"gres/gpu:8", slurm.SUCCESS},
    gpus_per_node_6 = {"gres/gpu:6" , slurm.ESLURM_INVALID_GRES},
    no_gpu_req = {nil , slurm.ESLURM_INVALID_GRES},
}

for test_name, test_payload in pairs(tests) do
    tg, expected = table.unpack(test_payload)
    test_job_desc = {
        name = "test-name",
        tres_per_node = tg,
        gres = tg,
    }

    ret = jsm.slurm_job_submit(test_job_desc, nil, 9002)
    if ret == expected then
        print(string.format("SUCCESS, gres: %s, returned: %s", tg, ret))
    else
        print(string.format("FAIL, gres: %s, returned: %s", tg, ret))
    end
end
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorwave.com/slurm/job-submission-lua-plugin.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
