03. EnsembleRetriever

EnsembleRetriever Is LangChain's ability to combine multiple searchers to provide stronger search results. This searcher can take advantage of various search algorithms to achieve better performance than a single algorithm.

Main features One. Integrating multiple searchers: Receive different types of searchers as input to combine results. 2. Resorting results: Reciprocal Rank Fusion Use algorithms to rank results. 3. Hybrid search: mainly sparse retriever (E.g. BM25) and dense retriever (E.g. embedding similarity) is used in combination.

Advantages -Sparse retriever: effective for keyword-based search -Dense retriever: effective for semantic similarity based searches

Due to these complementary characteristics EnsembleRetriever Can provide improved performance in various search scenarios.

More details LangChain official document See.

Copy

# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv

# API Load key information
load_dotenv()

Copy

True

Copy

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH11-Retriever")

Copy

 Start tracking LangSmith. 
[Project name] 
CH11-Retriever

EnsembleRetriever By initializing BM25Retriever Wow FAISS Combine the finder. Each searcher is weighted.

Copy

from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# list of sample documents
doc_list = [
    "I like apples",
    "I like apple company",
    "I like apple's iphone",
    "Apple is my favorite company",
    "I like apple's ipad",
    "I like apple's macbook",
]


# bm25 retriever와 faiss retriever last update.
bm25_retriever = BM25Retriever.from_texts(
    doc_list,
)
bm25_retriever.k = 1  # BM25Retriever Set the number of search results to 1.

embedding = OpenAIEmbeddings()  # OpenAI Use embedding.
faiss_vectorstore = FAISS.from_texts(
    doc_list,
    embedding,
)
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 1})

# Initialize the ensemble retriever.
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, faiss_retriever],
    weights=[0.7, 0.3],
)

ensemble_retriever Object get_relevant_documents() Search for relevant documents by calling the method.

Copy

# Get search results documents.
query = "my favorite fruit is apple"
ensemble_result = ensemble_retriever.invoke(query)
bm25_result = bm25_retriever.invoke(query)
faiss_result = faiss_retriever.invoke(query)

# Print the imported document.
print("[Ensemble Retriever]")
for doc in ensemble_result:
    print(f"Content: {doc.page_content}")
    print()

print("[BM25 Retriever]")
for doc in bm25_result:
    print(f"Content: {doc.page_content}")
    print()

print("[FAISS Retriever]")
for doc in faiss_result:
    print(f"Content: {doc.page_content}")
    print()

Copy

[Ensemble Retriever] 
Content: Apple is my favorite company 

Content: I like apples 

[BM25 Retriever] 
Content: Apple is my favorite company 

[FAISS Retriever] 
Content: I like apples

Copy

# get search results documents.
query = "Apple company makes my favorite iphone"
ensemble_result = ensemble_retriever.invoke(query)
bm25_result = bm25_retriever.invoke(query)
faiss_result = faiss_retriever.invoke(query)

# print the imported document.
print("[Ensemble Retriever]")
for doc in ensemble_result:
    print(f"Content: {doc.page_content}")
    print()

print("[BM25 Retriever]")
for doc in bm25_result:
    print(f"Content: {doc.page_content}")
    print()

print("[FAISS Retriever]")
for doc in faiss_result:
    print(f"Content: {doc.page_content}")
    print()

Copy

[Ensemble Retriever] 
Content: Apple is my favorite company 

Content: I like apple's iphone 

[BM25 Retriever] 
Content: Apple is my favorite company 

[FAISS Retriever] 
Content: I like apple's iphone

Change of runtime Config

You can change the properties of retriever even at runtime. This is ConfigurableField It is possible using classes. - weights parameter ConfigurableField Defined as an object. -The field's ID is set to "ensemble_weights".

Copy

from langchain_core.runnables import ConfigurableField


ensemble_retriever = EnsembleRetriever(
    # Set up a list of retrievers. Here, bm25_retriever와 faiss_retriever More videos Your browser can't play this video. Learn more More videos on YouTube An error occurred while retrieving sharing information. Please try again later..
    retrievers=[bm25_retriever, faiss_retriever],
).configurable_fields(
    weights=ConfigurableField(
        # Sets a unique identifier for the search parameter.
        id="ensemble_weights",
        # Sets the name of the search parameter.
        name="Ensemble Weights",
        # Write a description for your search parameters.
        description="Ensemble Weights",
    )
)

Search City config Specify search settings through parameters.
ensemble_weights By setting the weight of the option to [1, 0] Weight of all search results gives BM25 retriever more Be sure to be.

Copy

config = {"configurable": {"ensemble_weights": [1, 0]}}

# config Specify search settings using parameters.
docs = ensemble_retriever.invoke("my favorite fruit is apple", config=config)
docs  # Prints the docs that are the search results.

Copy

 [Document (page_content='Apple is my favorite company'), Document (page_content='I like apples')]

This time, the weight of all search results when searching More to FAISS retriever Be sure to be.

Copy

config = {"configurable": {"ensemble_weights": [0, 1]}}

# config Specify search settings using parameters
.
docs = ensemble_retriever.invoke("my favorite fruit is apple", config=config)
docs  # Prints the docs that are the search results.

Copy

 [Document (page_content='I like apples'), Document (page_content='Apple is my favorite company')]

Previous02. Contextual CompressionRetriever Next04. Long context rearrangement (LongContextReorder)

Last updated 7 months ago

hashtagChange of runtime Config

Change of runtime Config