> For the complete documentation index, see [llms.txt](https://docs.tensorwave.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tensorwave.com/slurm/storage.md).

# Storage

### Shared Home Directory

`/home` is a shared, persistent filesystem backed by a high-performance distributed storage system and mounted on every login pod and worker pod in the cluster. Files written to `/home` from a login pod are immediately visible on any worker pod running your jobs, and vice versa.

This makes `/home` the right place for:

* Source code, scripts, and configuration files
* Job output you need to keep after the job finishes
* Data that needs to be accessible from multiple nodes at once

#### Storage quotas

Your `/home` allocation is defined in your deployment. To check current usage:

```bash
df -h /home/$USER
```

Live storage data can also be viewed via the TensorWave dashboard. If you need more space or your quota adjusted, contact your TensorWave account manager. If `/home` runs out of space, things will start to break. See Common Issues for advice on managing `/home`.

***

### Setting Up a Shared Directory

For shared data that multiple users on the same team need to access, the recommended approach is a shared directory under `/home` with appropriate group permissions.

**1. Create a shared directory:**

```bash
mkdir /home/shared/<project-name>
```

**2. Set group ownership:**

```bash
chgrp <group-name> /home/shared/<project-name>
```

**3. Set permissions so group members can read and write:**

```bash
chmod 2775 /home/shared/<project-name>
```

The `2` (setgid) bit ensures new files and subdirectories inherit the group, so members do not need to manually chown files they create there.

**4. Verify:**

```bash
ls -ld /home/shared/<project-name>
```

To create a globally available shared directory, use the `user` group. Groups are managed through LDAP. If you need a new group created or users added to an existing one, contact your cluster administrator.

***

### Worker Pod Storage

Worker pods have several storage locations available during a job. Understanding which to use prevents both data loss and performance issues.

#### Summary

| Path          | Type                   | Persists after job | Shared across nodes | Notes                                                                        |
| ------------- | ---------------------- | ------------------ | ------------------- | ---------------------------------------------------------------------------- |
| `/home/$USER` | Distributed network FS | Yes                | Yes                 | Durable and performant; use as primary storage                               |
| `/tmp`        | Memory-backed          | No                 | No                  | Fast local scratch; useful for caching large files                           |
| `/run/tmp`    | Memory-backed          | No                 | No                  | Fast local scratch; used by enroot for container-runtime                     |
| `/dev/shm`    | Memory-backed          | No                 | No                  | Fast local scratch; commonly used by pytorch for inter-process communication |

#### `/home`: primary storage

#### `/tmp`: pod-local scratch

Each worker pod has a memory-backed `/tmp`. It is fast relative to a network filesystem and suitable for intermediate files your job produces and consumes within the same pod. It is **not** shared between pods and is not guaranteed to be empty at job start (though it is cleaned between jobs by policy). Do not write job outputs here that you need after the job finishes. Keep in mind that this space is carved out of the node's RAM, writing large amounts of data here can lead up pages being swapped to disk, leading to reduced performance of your job. The filesystem is cleared when the pod is replaced.

#### `/run/tmp` and `/dev/shm`: tmpfs for system resources

`/run/tmp` and `/dev/shm` are memory-backed filesystems (`tmpfs`) mounted on each worker. These are similar to `/tmp`, but with less space and are reserved for application use. `/run/tmp` is used internally by Pyxis/Enroot for container image staging (`/run/tmp/enroot-data`, `/run/tmp/enroot-runtime`). `/dev/shm` is used by PyTorch for inter-process communication.

***

> **General rule:** Write outputs you need to keep to `/home`. Use `/tmp` for scratch that only lives for the duration of the job. Treat all pod-local paths as ephemeral.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.tensorwave.com/slurm/storage.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
