Demo: Load Models with transformers

Demo: Load Models with transformers

Now that you understand how to choose the right model, let’s load one and generate some text with Hugging Face’s transformers library.

This simple demo shows how to:

  • Load a pre-trained LLM

  • Tokenize a prompt

  • Generate a response

  • Print the output


Step 1: Import the Libraries

Create a Python file (e.g., demo.py) or open a notebook, then:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

Step 2: Pick a Model

Here are a few popular open‑source choices you can plug in:

Model
Example Identifier
Notes

GPT‑2

gpt2

Lightweight, good for demo.

Llama‑2

meta-llama/Llama-2-7b-chat-hf

Strong chat model, needs big GPU.

Bloom

bigscience/bloom-560m

Good multilingual option.

Vicuna

lmsys/vicuna-7b-v1.5

Community fine-tuned chat model.


Step 3: Load the Model & Tokenizer

Replace MODEL_NAME with your choice:

Note: For models like Llama‑2 or Vicuna, you may need to accept a license on their Hugging Face page and use a token.


Step 4: Create a Simple Generation Pipeline


Step 5: Run Your First Prompt


Sample Output


🗝️ How It Works

  • AutoTokenizer converts text to tokens the model understands.

  • AutoModelForCausalLM loads a pre-trained language model for text generation.

  • pipeline() wraps the model for easy input/output.

  • Parameters like max_length, temperature, do_sample control creativity.


Tips

  • Larger models like Llama‑2‑13B need a good GPU — or use quantized versions (search for “GPTQ” or “4bit” models).

  • Always check model documentation for hardware requirements and license conditions.

  • For chat models, you can test a conversation loop or use transformers.ChatTemplate if supported.


➡️ Next: You’ll learn how to fine-tune or adapt your base model to better follow instructions!

Last updated