PyTorch Quickstart

Estimated time: 2 minutes, 3 minutes with buffer


Using ROCm Devices

PyTorch is officially supported by AMD for ROCm, and should be plug-and-play once set up correctly.

Learn more about installing PyTorch with ROCm here

AMD GPU devices are configured and accessed the exact same way as NVIDIA GPU devices. This means that any workflow that sets the PyTorch device the following way will work out-of-the-box, assuming PyTorch can detect your GPUs:

torch.device("cuda")

Debugging

In order to test whether your system is configured to use PyTorch with GPU acceleration, begin by starting a new file to run a couple of debugging commands:

mkdir pytorch-hello-world
cd pytorch-hello-world
nano debug.py

The following code will return a boolean indicating whether your GPUs are being detected by PyTorch:

import torch
print(torch.cuda.is_available())

Now, go ahead and run your file using:

python3 debug.py

In the event that this does not return True, there are a couple things you must check.

PyTorch Setup

One reason the above command may not function properly is that the incorrect version of PyTorch is installed. To check, add the following line to your debugging file:

print(torch.__version__)

You should get an output similar to:

[torch_version]a0+git[hash]

Or:

[torch_version].dev[date]+rocm[rocm_version]

If this output is not a ROCm-enabled PyTorch build, you must reinstall PyTorch with the correct version. One way to do this would be:

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/

Checking ROCm Setup

To ensure ROCm is properly configured, run the following command:

rocm-smi

The output should be similar to (depending on your number of devices):

========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  [Model : Revision]    Temp        Power     Partitions      SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%  
        Name (20 chars)       (Junction)  (Socket)  (Mem, Compute)                                                  
====================================================================================================================
0       [0x74a1 : 0x00]       45.0°C      142.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
1       [0x74a1 : 0x00]       42.0°C      135.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
2       [0x74a1 : 0x00]       42.0°C      137.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
3       [0x74a1 : 0x00]       48.0°C      141.0W    NPS1, SPX       138Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
4       [0x74a1 : 0x00]       46.0°C      142.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
5       [0x74a1 : 0x00]       40.0°C      137.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
6       [0x74a1 : 0x00]       47.0°C      142.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
7       [0x74a1 : 0x00]       42.0°C      132.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%    
        AMD Instinct MI300X                                                                                         
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

If this is not the case, ROCm is not properly installed. You will more likely, however, have issues running the following command:

rocminfo

The output should be of the format:

ROCk module version 6.7.0 is loaded
=====================    
HSA System Attributes    
=====================    
....

If this command errors, it's most likely that devices are not properly mounted, or your user is not a part of the render group.


Teardown

Navigate back to your base directory and remove your pytorch-hello-world folder:

cd ~
rm -rf pytorch-hello-world/

Last updated