Demo: Load Models with transformers
Demo: Load Models with transformers
Now that you understand how to choose the right model, let’s load one and generate some text with Hugging Face’s transformers library.
This simple demo shows how to:
Load a pre-trained LLM
Tokenize a prompt
Generate a response
Print the output
✅ Step 1: Import the Libraries
Create a Python file (e.g., demo.py) or open a notebook, then:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline✅ Step 2: Pick a Model
Here are a few popular open‑source choices you can plug in:
GPT‑2
gpt2
Lightweight, good for demo.
Llama‑2
meta-llama/Llama-2-7b-chat-hf
Strong chat model, needs big GPU.
Bloom
bigscience/bloom-560m
Good multilingual option.
Vicuna
lmsys/vicuna-7b-v1.5
Community fine-tuned chat model.
✅ Step 3: Load the Model & Tokenizer
Replace MODEL_NAME with your choice:
Note: For models like Llama‑2 or Vicuna, you may need to accept a license on their Hugging Face page and use a token.
✅ Step 4: Create a Simple Generation Pipeline
✅ Step 5: Run Your First Prompt
✅ Sample Output
🗝️ How It Works
AutoTokenizerconverts text to tokens the model understands.AutoModelForCausalLMloads a pre-trained language model for text generation.pipeline()wraps the model for easy input/output.Parameters like
max_length,temperature,do_samplecontrol creativity.
✅ Tips
Larger models like Llama‑2‑13B need a good GPU — or use quantized versions (search for “GPTQ” or “4bit” models).
Always check model documentation for hardware requirements and license conditions.
For chat models, you can test a conversation loop or use
transformers.ChatTemplateif supported.
➡️ Next: You’ll learn how to fine-tune or adapt your base model to better follow instructions!
Last updated