Showcase: how retrieval results show up in chat

Let’s see how your assistant uses retrieval to produce better, more grounded answers — and how you can show that to users in your chat UI.

A key benefit of RAG is transparency:

  • Users see that answers come from real, trusted sources — not just guesses.

  • It builds trust and helps debug unexpected replies.

  • It demonstrates that your assistant is connected to up-to-date or private knowledge.


What Happens Under the Hood

When the user asks a question: 1️⃣ The assistant embeds the question. 2️⃣ It searches the FAISS index for the most relevant text chunks. 3️⃣ It injects those chunks into the prompt. 4️⃣ The LLM generates a final answer grounded in those retrieved snippets.


Show the Retrieved Snippets to Users

A good practice: 👉 Display what passages were retrieved along with the generated answer. This makes the assistant’s reasoning visible — like sources in a search engine.


Example: Extend Your Gradio Chat

Add the retrieved context to the output.

def answer(user_query):
    # ➜ Embed + retrieve
    query_embedding = embedder.encode([user_query])
    D, I = index.search(np.array(query_embedding), k=3)
    retrieved_chunks = [docs[idx] for idx in I[0]]

    # ➜ Combine into context
    context = "\n\n".join(retrieved_chunks)

    # ➜ Prompt
    prompt = f"""You are a helpful assistant. Use the context below to answer the question.

Context:
{context}

Question: {user_query}

Answer:"""

    # ➜ Generate
    response = generator(
        prompt,
        max_length=512,
        do_sample=True,
        temperature=0.3,
    )

    # ➜ Return both answer + context for transparency
    answer_text = response[0]["generated_text"]

    return f"### Answer\n{answer_text.strip()}\n\n---\n### Retrieved Context\n{context.strip()}"

How It Appears in the Chat

User:

“How do I reset my device?”

Assistant:


Benefits of Showing Context

✔️ Users can verify where the answer came from. ✔️ They can catch outdated or irrelevant passages. ✔️ It helps you debug bad results: Did it find the wrong chunk? Do you need better chunking or embeddings?


For bigger projects, you can:

  • Show which file each chunk came from.

  • Add citations or links back to the original docs.

  • Let users click to view the full document.


What This Looks Like in Gradio

If you’re using gr.Chatbot, you can split the assistant’s response:

Or split into Answer and Context boxes:


🗝️ Key Takeaway

A RAG assistant isn’t a black box — showing the retrieved context helps everyone trust, improve, and expand your knowledge base.


➡️ Next: You’ll learn how to upload your final model and share your working assistant with real users!

Last updated