Demo: Load Models with transformers

Demo: Load Models with transformers

Now that you understand how to choose the right model, let’s load one and generate some text with Hugging Face’s transformers library.

This simple demo shows how to:

Load a pre-trained LLM
Tokenize a prompt
Generate a response
Print the output

✅ Step 1: Import the Libraries

Create a Python file (e.g., demo.py) or open a notebook, then:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

✅ Step 2: Pick a Model

Here are a few popular open‑source choices you can plug in:

Model

Example Identifier

Notes

GPT‑2

gpt2

Lightweight, good for demo.

Llama‑2

meta-llama/Llama-2-7b-chat-hf

Strong chat model, needs big GPU.

Bloom

bigscience/bloom-560m

Good multilingual option.

Vicuna

lmsys/vicuna-7b-v1.5

Community fine-tuned chat model.

✅ Step 3: Load the Model & Tokenizer

Replace MODEL_NAME with your choice:

MODEL_NAME = "gpt2"  # or "bigscience/bloom-560m", etc.

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

Note: For models like Llama‑2 or Vicuna, you may need to accept a license on their Hugging Face page and use a token.

✅ Step 4: Create a Simple Generation Pipeline

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

✅ Step 5: Run Your First Prompt

prompt = "Once upon a time"
outputs = generator(
    prompt,
    max_length=50,
    num_return_sequences=1,
    do_sample=True,
    temperature=0.7,
)

print("Generated text:")
print(outputs[0]["generated_text"])

✅ Sample Output

Generated text:
Once upon a time, there was a curious AI assistant who learned to answer questions about Python programming...

🗝️ How It Works

AutoTokenizer converts text to tokens the model understands.
AutoModelForCausalLM loads a pre-trained language model for text generation.
pipeline() wraps the model for easy input/output.
Parameters like max_length, temperature, do_sample control creativity.

✅ Tips

Larger models like Llama‑2‑13B need a good GPU — or use quantized versions (search for “GPTQ” or “4bit” models).
Always check model documentation for hardware requirements and license conditions.
For chat models, you can test a conversation loop or use transformers.ChatTemplate if supported.

➡️ Next: You’ll learn how to fine-tune or adapt your base model to better follow instructions!

PreviousOverview: Criteria for picking the base LLM (size, capabilities, license)NextHands-on: Querying the model and analyzing responses

Last updated 5 months ago