For the complete documentation index, see llms.txt. This page is also available as Markdown.

Access

SSH Access

The login URL for your cluster is available in the TensorWave dashboard or in the handoff documentation provided at cluster delivery.

Depending on how your cluster is exposed, the SSH service listens on either port 22 or port 32222. Check your handoff documentation if you are unsure which applies.

ssh <your-username>@<cluster-ssh-host>
# or, if using the node port:
ssh -p 32222 <your-username>@<cluster-ssh-host>

Authentication is by SSH key. Password authentication is disabled. Your public key must be present in your LDAP user record before first login; contact your cluster administrator if you cannot connect.

Successful login lands you on a login pod for your user. This is where you submit jobs, inspect the queue, and manage your files. Login pods are not intended for compute-intensive work.


Slurm Accounting and User Setup

The cluster enforces Slurm accounting: every user must have a Slurm account and association before they can submit jobs. This is required by the QOS and limits configuration.

Initial setup (cluster administrator)

A cluster administrator performs the following steps once per user using sacctmgr from a login pod or the Slurm controller.

1. Create an account (if one does not already exist for the team or project):

sacctmgr add account <account-name> Description="<description>"

2. Add the user and associate them with the account:

sacctmgr add user <username> Account=<account-name> DefaultAccount=<account-name>

3. Verify the association is in place:

The user can now submit jobs. Without a valid association, job submissions will be rejected.

Checking your own account

Users can verify their associations at any time:

For full sacctmgr reference, see the Slurm accounting documentation.


Accessing Worker Pods

Worker pods are accessible by SSH from within a login pod. Access is restricted to users who have a job currently running or a resource allocation active on the target node.

Connecting to a worker pod

If you already have a job running, find its allocated nodes:

Then SSH to the worker pod by its Slurm node name directly from the login pod:

Worker pods use port 2222 internally. This is configured automatically on login pods; no additional flags are required.

Allocating a node interactively with salloc

To open an interactive session without a batch script, use salloc to reserve resources and then SSH to the allocated worker pod:

Note: SSH to a worker pod will be refused if you do not have an active allocation on that node. This is by design.

The salloc session holds the allocation open. When you exit the salloc shell, the allocation is released and the node becomes available to other jobs.

For an interactive shell directly on the worker pod without a separate SSH step, use srun --pty:

To drop into an interactive shell inside a container, use apptainer shell with srun (don't forget the --pty flag):

This allocates a worker pod, pulls (or uses a cached) container image, and drops you into an interactive shell inside it with GPUs available. Use a local .sif file instead of a docker:// URI if you have already pulled the image. For more on building and running Apptainer images, including multi-node jobs and networking, see Containers and Modules.


IDP Configuration (Optional)

The cluster supports identity provider (IDP) integration, which allows users to authenticate using your organization's existing SSO, such as Okta, Azure AD, or Google Workspace.

IDP setup requires coordination during or after initial cluster deployment. To discuss options or initiate configuration, contact your TensorWave account manager.

Last updated