11. EnsembleRetriever with Convex Combination (CC)
Ensemble Retriever Convex Combination(CC) Add
Convex Combination(CC) AddNote: Explaining the differences between the algorithmic methods published by AutoRAG Uncomment below and update the package to proceed.
Copy
# proceed after update
# !pip install -qU langchain-teddynoteCopy
from dotenv import load_dotenv
load_dotenv()Copy
True Pre-setting for experiment
Copy
from langchain.retrievers import EnsembleRetriever as OriginalEnsembleRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_teddynote.retrievers import KiwiBM25Retriever
# load document(Load Documents)
loader = PDFPlumberLoader("data/Digital Government Innovation Promotion Plan.pdf")
# Split Documents: Set to a small Chunk Size for testing
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
split_documents = loader.load_and_split(text_splitter)
# Creating an embedding
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# FaissRetriever generation
faiss = FAISS.from_documents(
documents=split_documents, embedding=embeddings
).as_retriever(search_kwargs={"k": 5})
# KiwiBM25Retriever Generation (Korean morphological analyzer + BM25 algorithm)
bm25 = KiwiBM25Retriever.from_documents(documents=split_documents, embedding=embeddings)
bm25.k = 5
# LangChain version of EnsembleRetriever
original_ensemble_retriever = OriginalEnsembleRetriever(retrievers=[faiss, bm25])EnsembleRetriever creation in CC and RRF
Copy
Search result comparison
Copy
Search results
"Original"and"RRF"There should be no difference. (LangChain implements as it is)Search results
"CC"has"RRF"There may be a difference from.
RRF Wow CC Compare the search results of the method and borrow the right way for the document.
Copy
Copy
Last updated