Showcase: how retrieval results show up in chat
Let’s see how your assistant uses retrieval to produce better, more grounded answers — and how you can show that to users in your chat UI.
A key benefit of RAG is transparency:
Users see that answers come from real, trusted sources — not just guesses.
It builds trust and helps debug unexpected replies.
It demonstrates that your assistant is connected to up-to-date or private knowledge.
✅ What Happens Under the Hood
When the user asks a question: 1️⃣ The assistant embeds the question. 2️⃣ It searches the FAISS index for the most relevant text chunks. 3️⃣ It injects those chunks into the prompt. 4️⃣ The LLM generates a final answer grounded in those retrieved snippets.
✅ Show the Retrieved Snippets to Users
A good practice: 👉 Display what passages were retrieved along with the generated answer. This makes the assistant’s reasoning visible — like sources in a search engine.
✅ Example: Extend Your Gradio Chat
Add the retrieved context to the output.
def answer(user_query):
# ➜ Embed + retrieve
query_embedding = embedder.encode([user_query])
D, I = index.search(np.array(query_embedding), k=3)
retrieved_chunks = [docs[idx] for idx in I[0]]
# ➜ Combine into context
context = "\n\n".join(retrieved_chunks)
# ➜ Prompt
prompt = f"""You are a helpful assistant. Use the context below to answer the question.
Context:
{context}
Question: {user_query}
Answer:"""
# ➜ Generate
response = generator(
prompt,
max_length=512,
do_sample=True,
temperature=0.3,
)
# ➜ Return both answer + context for transparency
answer_text = response[0]["generated_text"]
return f"### Answer\n{answer_text.strip()}\n\n---\n### Retrieved Context\n{context.strip()}"✅ How It Appears in the Chat
User:
“How do I reset my device?”
Assistant:
✅ Benefits of Showing Context
✔️ Users can verify where the answer came from. ✔️ They can catch outdated or irrelevant passages. ✔️ It helps you debug bad results: Did it find the wrong chunk? Do you need better chunking or embeddings?
✅ Bonus: Highlight or Link Sources
For bigger projects, you can:
Show which file each chunk came from.
Add citations or links back to the original docs.
Let users click to view the full document.
✅ What This Looks Like in Gradio
If you’re using gr.Chatbot, you can split the assistant’s response:
Or split into Answer and Context boxes:
🗝️ Key Takeaway
A RAG assistant isn’t a black box — showing the retrieved context helps everyone trust, improve, and expand your knowledge base.
➡️ Next: You’ll learn how to upload your final model and share your working assistant with real users!
Last updated