Choose deployment: a CLI, Web UI (gradio), or integration into a chatbot platform

Once you have your RAG pipeline working — your assistant can generate relevant answers with custom knowledge — it’s time to make it accessible for real users.

You have three common ways to deploy it:

✅ Option 1: Command Line Interface (CLI)

When to use: ✔️ Fast, simple, no web server needed ✔️ Great for testing, local prototyping, or developer-only tools

Basic Pattern:

Create a Python script that:

Takes user input (input())
Runs the retrieval + generation steps
Prints the answer

Example:

while True:
    user_query = input("You: ")
    if user_query.lower() in ["exit", "quit"]:
        break

    # Embed ➜ Retrieve ➜ Build prompt ➜ Generate
    query_embedding = embedder.encode([user_query])
    D, I = index.search(np.array(query_embedding), k=3)
    retrieved_chunks = [docs[idx] for idx in I[0]]

    context = "\n\n".join(retrieved_chunks)
    prompt = f"Context:\n{context}\n\nQuestion: {user_query}\nAnswer:"

    response = generator(
        prompt,
        max_length=512,
        do_sample=True,
        temperature=0.3,
    )

    print("Assistant:", response[0]["generated_text"])

✅ Option 2: Web UI with `gradio`

When to use: ✔️ Friendly web interface ✔️ Easy to share with non-technical users ✔️ Supports chat history, images, voice, uploads

Basic Pattern:

1️⃣ Install Gradio:

pip install gradio

2️⃣ Create an interface:

import gradio as gr

def answer(user_query):
    query_embedding = embedder.encode([user_query])
    D, I = index.search(np.array(query_embedding), k=3)
    retrieved_chunks = [docs[idx] for idx in I[0]]
    context = "\n\n".join(retrieved_chunks)

    prompt = f"Context:\n{context}\n\nQuestion: {user_query}\nAnswer:"
    response = generator(
        prompt,
        max_length=512,
        do_sample=True,
        temperature=0.3,
    )
    return response[0]["generated_text"]

gr.Interface(
    fn=answer,
    inputs="text",
    outputs="text",
    title="My AI Assistant",
    description="Ask me questions! I’ll find relevant info from my knowledge base.",
).launch()

✔️ This gives you an instant local web app at http://localhost:7860.

✅ Option 3: Integration into a Chatbot Platform

When to use: ✔️ Deploy to messaging apps, websites, customer support, or internal tools ✔️ Use frameworks like:

Botpress
Rasa
Slack bots
WhatsApp API
Discord bots

Basic Pattern:

Connect your backend logic (Python) to the messaging platform’s API.
Handle incoming messages ➜ run embed ➜ retrieve ➜ generate ➜ return the reply.
Use frameworks like FastAPI or Flask to expose an HTTP endpoint.

Example: Simple FastAPI endpoint

from fastapi import FastAPI, Request
from pydantic import BaseModel

app = FastAPI()

class Query(BaseModel):
    message: str

@app.post("/chat")
async def chat(query: Query):
    user_query = query.message
    query_embedding = embedder.encode([user_query])
    D, I = index.search(np.array(query_embedding), k=3)
    retrieved_chunks = [docs[idx] for idx in I[0]]
    context = "\n\n".join(retrieved_chunks)
    prompt = f"Context:\n{context}\n\nQuestion: {user_query}\nAnswer:"

    response = generator(
        prompt,
        max_length=512,
        do_sample=True,
        temperature=0.3,
    )
    return {"response": response[0]["generated_text"]}

Your bot platform calls this /chat endpoint when a user sends a message.

✅ How to Choose

Deployment

Best For

Pros

Cons

CLI

Local testing

Fast, easy

Not user-friendly for non-tech users

Gradio Web UI

Prototypes, demos, teaching

Zero config web app

Not suited for heavy production load

Chatbot Integration

Real deployment

Multi-user, cross-platform

Needs backend, hosting, and auth

🗝️ Key Takeaway

Your RAG assistant is only useful if real people can interact with it. Pick the simplest deployment for your audience:

CLI for solo use
Gradio for demos
Full integration for production

➡️ Next: Let’s test your UI and see how your assistant performs live!

PreviousChapter 5 NextWalkthrough: Build a Gradio UI that sends user input and displays responses

Last updated 5 months ago

✅ Option 1: Command Line Interface (CLI)

✅ Option 2: Web UI with gradio

✅ Option 3: Integration into a Chatbot Platform

✅ How to Choose

🗝️ Key Takeaway

✅ Option 2: Web UI with `gradio`