Showcase: how retrieval results show up in chat

Let’s see how your assistant uses retrieval to produce better, more grounded answers — and how you can show that to users in your chat UI.

A key benefit of RAG is transparency:

Users see that answers come from real, trusted sources — not just guesses.
It builds trust and helps debug unexpected replies.
It demonstrates that your assistant is connected to up-to-date or private knowledge.

✅ What Happens Under the Hood

When the user asks a question: 1️⃣ The assistant embeds the question. 2️⃣ It searches the FAISS index for the most relevant text chunks. 3️⃣ It injects those chunks into the prompt. 4️⃣ The LLM generates a final answer grounded in those retrieved snippets.

✅ Show the Retrieved Snippets to Users

A good practice: 👉 Display what passages were retrieved along with the generated answer. This makes the assistant’s reasoning visible — like sources in a search engine.

✅ Example: Extend Your Gradio Chat

Add the retrieved context to the output.

def answer(user_query):
    # ➜ Embed + retrieve
    query_embedding = embedder.encode([user_query])
    D, I = index.search(np.array(query_embedding), k=3)
    retrieved_chunks = [docs[idx] for idx in I[0]]

    # ➜ Combine into context
    context = "\n\n".join(retrieved_chunks)

    # ➜ Prompt
    prompt = f"""You are a helpful assistant. Use the context below to answer the question.

Context:
{context}

Question: {user_query}

Answer:"""

    # ➜ Generate
    response = generator(
        prompt,
        max_length=512,
        do_sample=True,
        temperature=0.3,
    )

    # ➜ Return both answer + context for transparency
    answer_text = response[0]["generated_text"]

    return f"### Answer\n{answer_text.strip()}\n\n---\n### Retrieved Context\n{context.strip()}"

✅ How It Appears in the Chat

User:

“How do I reset my device?”

Assistant:

### Answer
To reset your device, hold the power button for 10 seconds until it restarts. If that does not work, refer to the troubleshooting section.

---

### Retrieved Context
1. "To reset your device, press and hold the power button for at least 10 seconds."
2. "If the device does not respond, consult the troubleshooting section of the manual."
3. "Make sure your device is fully charged before performing a reset."

✅ Benefits of Showing Context

✔️ Users can verify where the answer came from. ✔️ They can catch outdated or irrelevant passages. ✔️ It helps you debug bad results: Did it find the wrong chunk? Do you need better chunking or embeddings?

✅ Bonus: Highlight or Link Sources

For bigger projects, you can:

Show which file each chunk came from.
Add citations or links back to the original docs.
Let users click to view the full document.

✅ What This Looks Like in Gradio

If you’re using gr.Chatbot, you can split the assistant’s response:

chatbot = gr.Chatbot()

def respond(message, chat_history):
    full_response = answer(message)
    chat_history.append((message, full_response))
    return "", chat_history

Or split into Answer and Context boxes:

gr.Interface(
    fn=answer,
    inputs="text",
    outputs=[
        gr.Textbox(label="Assistant’s Answer"),
        gr.Textbox(label="Retrieved Context")
    ]
)

🗝️ Key Takeaway

A RAG assistant isn’t a black box — showing the retrieved context helps everyone trust, improve, and expand your knowledge base.

➡️ Next: You’ll learn how to upload your final model and share your working assistant with real users!

PreviousWalkthrough: Build a Gradio UI that sends user input and displays responses NextChapter 6

Last updated 5 months ago