11. EnsembleRetriever with Convex Combination (CC)

Ensemble Retriever Convex Combination(CC) Add

Copy

# proceed after update
# !pip install -qU langchain-teddynote

Copy

from dotenv import load_dotenv

load_dotenv()

Copy

 True 

Pre-setting for experiment

Copy

from langchain.retrievers import EnsembleRetriever as OriginalEnsembleRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_teddynote.retrievers import KiwiBM25Retriever

# load document(Load Documents)
loader = PDFPlumberLoader("data/Digital Government Innovation Promotion Plan.pdf")

# Split Documents: Set to a small Chunk Size for testing
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
split_documents = loader.load_and_split(text_splitter)

# Creating an embedding
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# FaissRetriever generation
faiss = FAISS.from_documents(
    documents=split_documents, embedding=embeddings
).as_retriever(search_kwargs={"k": 5})

# KiwiBM25Retriever Generation (Korean morphological analyzer + BM25 algorithm)
bm25 = KiwiBM25Retriever.from_documents(documents=split_documents, embedding=embeddings)
bm25.k = 5

# LangChain version of EnsembleRetriever
original_ensemble_retriever = OriginalEnsembleRetriever(retrievers=[faiss, bm25])

EnsembleRetriever creation in CC and RRF

Copy

Search result comparison

Copy

  • Search results "Original" and "RRF" There should be no difference. (LangChain implements as it is)

  • Search results "CC" has "RRF" There may be a difference from.

RRF Wow CC Compare the search results of the method and borrow the right way for the document.

Copy

Copy

Last updated