Build a mini knowledge base from local docs

Now you know why you need retrieval and which tools to use — let’s build a simple knowledge base (KB) your assistant can search in real time.

A mini KB means:

A small collection of your own files (PDFs, text, markdown, or notes).
Chunked into passages.
Embedded as vectors for similarity search.

This is the heart of Retrieval-Augmented Generation (RAG).

✅ Step 1️⃣ – Gather Local Files

Decide what your assistant should know:

Help docs
Study notes
Internal FAQs
Markdown files
Local PDFs

👉 Example: docs/ folder with:

docs/
 ├─ faq.txt
 ├─ guide.md
 ├─ product_manual.pdf

✅ Step 2️⃣ – Load & Chunk Text

Use Python to read your files and split them into small, meaningful chunks (~100–300 words each) — so they fit inside the LLM’s context window.

Example:

from pathlib import Path

docs = []
for file in Path("docs").glob("*"):
    text = Path(file).read_text(encoding="utf-8")
    # Split by paragraphs (simple)
    chunks = text.split("\n\n")
    for chunk in chunks:
        if len(chunk.strip()) > 50:
            docs.append(chunk.strip())

print(f"Total chunks: {len(docs)}")

👉 Tip: For PDFs, use PyMuPDF or pdfminer:

pip install pymupdf

Example:

import fitz  # PyMuPDF

doc = fitz.open("docs/product_manual.pdf")
text = ""
for page in doc:
    text += page.get_text()

chunks = text.split("\n\n")

✅ Step 3️⃣ – Embed Each Chunk

Turn each text chunk into a vector using a sentence embedding model.

Example with sentence-transformers:

pip install sentence-transformers

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer("all-MiniLM-L6-v2")

embeddings = embedder.encode(docs, show_progress_bar=True)
print(f"Generated {len(embeddings)} embeddings.")

✅ Step 4️⃣ – Store in FAISS

Save these vectors in a FAISS index for fast retrieval.

pip install faiss-cpu

import faiss
import numpy as np

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)

index.add(np.array(embeddings))
faiss.write_index(index, "kb.index")

# Save chunks for reference
import json
with open("chunks.json", "w") as f:
    json.dump(docs, f)

✅ Step 5️⃣ – Query Your Knowledge Base

When a user asks a question: 1️⃣ Embed the query ➜ 2️⃣ Find top N chunks ➜ 3️⃣ Pass them + the query to your LLM.

Example:

query = "How do I reset my device?"
query_vec = embedder.encode([query])

D, I = index.search(np.array(query_vec), k=3)  # top 3 matches
print("Best matches:")
for idx in I[0]:
    print(docs[idx])

✅ How It Works Together

✅ Now your assistant:

Embeds user questions
Finds relevant info from your local docs
Adds that info to the prompt
Generates grounded, up-to-date answers

🗝️ Key Takeaway

This mini knowledge base gives your assistant a custom memory of your own files — bridging the gap between the frozen base LLM and your real-world needs.

➡️ Next: You’ll learn how to connect this retrieval step to your chat interface — so your assistant can quote relevant info on demand!

PreviousTools: faiss (vector DB), datasets, or the HF Dataset Streams NextIntegrate retrieval steps before inference to boost relevancy

Last updated 5 months ago