Demo: Maintaining context through longer sessions
Now that you understand why context matters and how to budget tokens using sliding windows and retrieval — let’s see it in action with a simple multi-turn chat demo.
This demo shows you how to:
Keep a rolling conversation history
Manage what stays in the prompt
Optionally summarize or retrieve older info
Make your assistant feel coherent over multiple turns
✅ Goal
Build a simple Gradio chat (or Python CLI) that: 1️⃣ Stores each turn in a session history 2️⃣ Adds history to the prompt for every new question 3️⃣ Automatically trims or summarizes old turns if they exceed your token budget
✅ Step 1️⃣ — Basic Session Store
Use a Python list to track messages:
# Example format: [("User", "text"), ("Assistant", "text")]
session_history = []✅ Step 2️⃣ — Add a Message
When the user submits a new question:
Add the question to the list
Build a prompt that includes the system prompt + conversation so far
✅ Step 3️⃣ — Example Prompt Builder
✅ Step 4️⃣ — Run the LLM
✅ Step 5️⃣ — Add to Gradio Chatbot
✅ Optional: Add Summarization
When the chat gets too long:
Use your LLM to summarize old turns.
Replace old details with a short summary.
Example:
Then store:
✅ See It in Action
Ask “What is Python?”
Then: “Who created it?”
Then: “When was it released?”
Each follow-up uses previous answers to stay relevant!
✅ Key Benefits
✔️ Simple rolling window = fast, works for most casual chats. ✔️ Summaries = squeeze more context into your token budget. ✔️ Session list = easily expandable: store in DB, link to user ID, or persist between calls.
🗝️ Key Takeaway
A smart assistant remembers what you said — within the limits of its context. Managing this well makes your AI feel more natural, helpful, and trustworthy for longer conversations.
➡️ Next: Learn how to add tool use or plug in plugins so your assistant can do real tasks, not just answer text!
Last updated