Hugging Face Quickstart

Estimated time: 7 minutes, 9 minutes with buffer.


Hugging Face is an AI/ML platform for the entire model pipeline. For this quickstart, we'll walk you through accelerated inference using a pretrained model.

circle-info

Learn more about Hugging Face herearrow-up-right.


Installing Dependencies

Because PyTorch with ROCm comes preloaded on your device, you will not need to install this dependency. However, you will still need a couple of libraries in order to run our quickstart script. Begin by installing transformersarrow-up-right using the following command:

pip install transformers

This should take no more than a few minutes.


Creating and Running Inference Script

Next, go ahead and create and navigate to a new directory to create your script in:

mkdir hf-hello-world
cd hf-hello-world

Then, create a new script:

nano hello-world.py

Within this script, paste the following code and exit:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time

# Load model without quantization
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")

# Move model to GPU
model = model.to("cuda")

# Input text
print("Warming up model...")
input_text = "Hello, my name is"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
warmup = model.generate(**inputs, max_new_tokens=20)

print("Preparing text...")
input_text = "According to all known laws of aviation, there is no way that a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

print("Starting inference...")
start = time.time()
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
    no_repeat_ngram_size=2
)
t = time.time()-start
print(f"inference time: {t}")

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

After doing so, you may run the script using the following:

This runs a small model on one GPU, but feel free to swap out your model and prompts to your liking, then map to the proper devices. The output should be similar to:


Teardown

Navigate back to your base directory and remove your hf-hello-world folder:

Last updated