EnsembleRetriever Is LangChain's ability to combine multiple searchers to provide stronger search results. This searcher can take advantage of various search algorithms to achieve better performance than a single algorithm.
Main features One. Integrating multiple searchers: Receive different types of searchers as input to combine results. 2. Resorting results: Reciprocal Rank Fusion Use algorithms to rank results. 3. Hybrid search: mainly sparse retriever (E.g. BM25) and dense retriever (E.g. embedding similarity) is used in combination.
Advantages -Sparse retriever: effective for keyword-based search -Dense retriever: effective for semantic similarity based searches
Due to these complementary characteristics EnsembleRetriever Can provide improved performance in various search scenarios.
# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv
# API Load key information
load_dotenv()
Copy
True
Copy
# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging
# Enter a project name.
logging.langsmith("CH11-Retriever")
Copy
EnsembleRetriever By initializing BM25Retriever Wow FAISS Combine the finder. Each searcher is weighted.
Copy
ensemble_retriever Object get_relevant_documents() Search for relevant documents by calling the method.
Copy
Copy
Copy
Copy
Change of runtime Config
You can change the properties of retriever even at runtime. This is ConfigurableField It is possible using classes. - weights parameter ConfigurableField Defined as an object. -The field's ID is set to "ensemble_weights".
Copy
Search City config Specify search settings through parameters.
ensemble_weights By setting the weight of the option to [1, 0] Weight of all search results gives BM25 retriever more Be sure to be.
Copy
Copy
This time, the weight of all search results when searching More to FAISS retriever Be sure to be.
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
# list of sample documents
doc_list = [
"I like apples",
"I like apple company",
"I like apple's iphone",
"Apple is my favorite company",
"I like apple's ipad",
"I like apple's macbook",
]
# bm25 retriever와 faiss retriever last update.
bm25_retriever = BM25Retriever.from_texts(
doc_list,
)
bm25_retriever.k = 1 # BM25Retriever Set the number of search results to 1.
embedding = OpenAIEmbeddings() # OpenAI Use embedding.
faiss_vectorstore = FAISS.from_texts(
doc_list,
embedding,
)
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 1})
# Initialize the ensemble retriever.
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever],
weights=[0.7, 0.3],
)
# Get search results documents.
query = "my favorite fruit is apple"
ensemble_result = ensemble_retriever.invoke(query)
bm25_result = bm25_retriever.invoke(query)
faiss_result = faiss_retriever.invoke(query)
# Print the imported document.
print("[Ensemble Retriever]")
for doc in ensemble_result:
print(f"Content: {doc.page_content}")
print()
print("[BM25 Retriever]")
for doc in bm25_result:
print(f"Content: {doc.page_content}")
print()
print("[FAISS Retriever]")
for doc in faiss_result:
print(f"Content: {doc.page_content}")
print()
[Ensemble Retriever]
Content: Apple is my favorite company
Content: I like apples
[BM25 Retriever]
Content: Apple is my favorite company
[FAISS Retriever]
Content: I like apples
# get search results documents.
query = "Apple company makes my favorite iphone"
ensemble_result = ensemble_retriever.invoke(query)
bm25_result = bm25_retriever.invoke(query)
faiss_result = faiss_retriever.invoke(query)
# print the imported document.
print("[Ensemble Retriever]")
for doc in ensemble_result:
print(f"Content: {doc.page_content}")
print()
print("[BM25 Retriever]")
for doc in bm25_result:
print(f"Content: {doc.page_content}")
print()
print("[FAISS Retriever]")
for doc in faiss_result:
print(f"Content: {doc.page_content}")
print()
[Ensemble Retriever]
Content: Apple is my favorite company
Content: I like apple's iphone
[BM25 Retriever]
Content: Apple is my favorite company
[FAISS Retriever]
Content: I like apple's iphone
from langchain_core.runnables import ConfigurableField
ensemble_retriever = EnsembleRetriever(
# Set up a list of retrievers. Here, bm25_retriever와 faiss_retriever More videos Your browser can't play this video. Learn more More videos on YouTube An error occurred while retrieving sharing information. Please try again later..
retrievers=[bm25_retriever, faiss_retriever],
).configurable_fields(
weights=ConfigurableField(
# Sets a unique identifier for the search parameter.
id="ensemble_weights",
# Sets the name of the search parameter.
name="Ensemble Weights",
# Write a description for your search parameters.
description="Ensemble Weights",
)
)
config = {"configurable": {"ensemble_weights": [1, 0]}}
# config Specify search settings using parameters.
docs = ensemble_retriever.invoke("my favorite fruit is apple", config=config)
docs # Prints the docs that are the search results.
[Document (page_content='Apple is my favorite company'), Document (page_content='I like apples')]
config = {"configurable": {"ensemble_weights": [0, 1]}}
# config Specify search settings using parameters
.
docs = ensemble_retriever.invoke("my favorite fruit is apple", config=config)
docs # Prints the docs that are the search results.
[Document (page_content='I like apples'), Document (page_content='Apple is my favorite company')]