Demo: Maintaining context through longer sessions

Now that you understand why context matters and how to budget tokens using sliding windows and retrieval — let’s see it in action with a simple multi-turn chat demo.

This demo shows you how to:

  • Keep a rolling conversation history

  • Manage what stays in the prompt

  • Optionally summarize or retrieve older info

  • Make your assistant feel coherent over multiple turns


Goal

Build a simple Gradio chat (or Python CLI) that: 1️⃣ Stores each turn in a session history 2️⃣ Adds history to the prompt for every new question 3️⃣ Automatically trims or summarizes old turns if they exceed your token budget


Step 1️⃣ — Basic Session Store

Use a Python list to track messages:

# Example format: [("User", "text"), ("Assistant", "text")]
session_history = []

Step 2️⃣ — Add a Message

When the user submits a new question:

  • Add the question to the list

  • Build a prompt that includes the system prompt + conversation so far


Step 3️⃣ — Example Prompt Builder


Step 4️⃣ — Run the LLM


Step 5️⃣ — Add to Gradio Chatbot


Optional: Add Summarization

When the chat gets too long:

  • Use your LLM to summarize old turns.

  • Replace old details with a short summary.

Example:

Then store:


See It in Action

  • Ask “What is Python?”

  • Then: “Who created it?”

  • Then: “When was it released?”

Each follow-up uses previous answers to stay relevant!


Key Benefits

✔️ Simple rolling window = fast, works for most casual chats. ✔️ Summaries = squeeze more context into your token budget. ✔️ Session list = easily expandable: store in DB, link to user ID, or persist between calls.


🗝️ Key Takeaway

A smart assistant remembers what you said — within the limits of its context. Managing this well makes your AI feel more natural, helpful, and trustworthy for longer conversations.


➡️ Next: Learn how to add tool use or plug in plugins so your assistant can do real tasks, not just answer text!

Last updated