Hands-on: Querying the model and analyzing responses

Now that you’ve successfully loaded your base LLM, let’s experiment with real queries and learn how to interpret the output. This step will help you see how your model behaves before you fine-tune or add advanced features.

✅ Step 1: Try Different Prompts

Start with a simple prompt and adjust:

prompt = "Explain what a Python function is:"
outputs = generator(
    prompt,
    max_length=100,
    do_sample=True,
    temperature=0.7,
)
print(outputs[0]["generated_text"])

Experiment: Try changing:

prompt
max_length (how long the response can be)
temperature (0–1: lower = more predictable, higher = more creative)
num_return_sequences (get multiple variations)

✅ Step 2: Look at the Response

Observe:

Does the answer follow instructions?
Is it relevant or rambling?
Does it repeat itself?
Does it include hallucinated or made-up facts?

Example:

Prompt: Explain what a Python function is:
Output: A Python function is a block of reusable code that performs a specific task. It can take inputs, called arguments, and returns an output...

✅ Step 3: Try a Chat-Style Prompt

For chat models (like Llama-2-Chat or Vicuna), use a conversation-like prompt:

prompt = "User: How do I write a loop in Python?\nAssistant:"
outputs = generator(
    prompt,
    max_length=150,
    do_sample=True,
    temperature=0.7,
)
print(outputs[0]["generated_text"])

✅ Step 4: Test Edge Cases

Ask tricky or unusual questions:

Factual: “Who is the president of France?”
Opinion: “What’s the best programming language?”
Unsafe: “How do I hack a website?” (It shouldn’t answer.)
Off-topic: “Tell me a joke!”

See where the model is strong — or where it might fail.

✅ Step 5: Adjust Generation Settings

Try tweaking:

temperature: Lower = more focused answers. Higher = more creative but riskier.
top_k or top_p: Control randomness (advanced).
num_beams: Use beam search for more deterministic output.

Example:

outputs = generator(
    prompt,
    max_length=100,
    do_sample=True,
    temperature=0.5,
    top_p=0.9,
    top_k=50,
)

✅ Step 6: Analyze for Next Steps

Use what you observe to decide:

Do you need instruction fine-tuning to make answers clearer?
Should you limit topics with guardrails?
Does your hardware handle the model well enough for longer outputs?
Is the response too general, meaning you may need RAG or a better base model?

⚙️ Takeaway

This hands-on session helps you:

Understand your model’s default behavior.
Learn how generation settings affect output.
See whether you need extra data or a more advanced approach.

➡️ Next: You’ll move on to instruction fine-tuning, so your assistant gives better, clearer, more helpful answers every time!

PreviousDemo: Load Models with transformers NextChapter 3

Last updated 5 months ago