01. VectorStore-backed Retriever
VectorStore Support Finder is a retriever that searches for documents using the vector store.
Vector store Similarity search Ina MMR Query text within the vector store using the same search method.
Run the code below to generate VectorStore
Copy
# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv
# API Load key information
load_dotenv()Copy
TrueCopy
# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging
# Enter a project name.
logging.langsmith("CH11-Retriever")Copy
Copy
VectorStoreRetriever initialization at VectorStore (as_retriever)
as_retriever The method initializes and returns VectorStoreRetriever based on the VectorStore object. This method allows you to set up various search options to perform document searches tailored to your needs.
Parameters
**kwargs: Keyword factor to pass to search functionsearch_type: Search type ("similarity", "mmr", "similarity_score_threshold")search_kwargs: Additional search optionsk: Number of documents to return (default: 4)score_threshold: minimum similarity threshold for similarity_score_threshold searchfetch_k: Number of documents to pass to MMR algorithm (default: 20)lambda_mult: Diversity regulation of MMR results (between 0-1, default: 0.5)filter: Document metadata based filtering
Return value
VectorStoreRetriever: Initialized VectorStoreRetriever object
Reference
Various search strategies can be implemented (similarity, MMR, threshold based)
MMR (Maximal Marginal Relevance) algorithm allows you to regulate the diversity of search results
Metadata filtering allows only documents with specific conditions to be retrieved
tagsTagging can be added to the searcher via parameters
caution
search_typeandsearch_kwargsProper combination requiredWhen using MMR
fetch_kWowkNeed to balance valuesscore_thresholdValues that are too high at the time of setting may not have search resultsWhen using the filter, it is necessary to pinpoint the metadata structure of the dataset.
lambda_multThe closer the value is to 0, the higher the diversity, the closer to 1, the higher the similarity.
Copy
Retriever invoke( )
invoke The method is Retriever's main entry point, used to retrieve related documents. This method synchronously calls Retriever to return relevant documents for a given query.
Parameters
input: Search query stringconfig: Retriever configuration (Optional[RunnableConfig])**kwargs: Additional factors to pass to Retriever
Return value
List[Document]: List of related documents
Copy
Copy
Max Marginal Relevance (MMR)
MMR(Maximal Marginal Relevance) The way the documents retrieved when searching for related items for queries Duplicate This is one way to avoid.
Instead of simply searching for only the most relevant items, MMR is about queries Document relevance And already selected simultaneously consider discrimination against documents To.
search_typeparameter"mmr"By setting MMR (Maximal Marginal Relevance) Use search algorithms.k: Number of documents to return (default: 4)fetch_k: Number of documents to pass to MMR algorithm (default: 20)lambda_mult: Diversity control of MMR results (0~1, default: 0.5, 0: Similarity score only, 1: Diversity only)
Copy
Copy
Similarity score threshold search (similarity_score_threshold)
You can set a similarity score threshold and set a search method that returns only documents with points above that threshold.
By setting the threshold appropriately Filter less relevant documents Do, Screening only the most similar documents You can. - search_type parameter "similarity_score_threshold" Set to perform a search based on the similarity score threshold.
search_kwargsIn parameters{"score_threshold": 0.8}Pass the similarity score threshold to 0.8. This is the search result Only documents with a similarity score of 0.8 or higher are returned Means.
Copy
Copy
top_k setting
Use when searching k You can specify search keyword factors (kwargs) like this.
k The parameter represents the number of parent results to return from the search results. - search_kwargs in k Set the parameter to 1 to specify the number of documents to return as search results.
Copy
Copy
Dynamic settings (Configurable)
To dynamically adjust search settings
ConfigurableFieldUse.ConfigurableFieldIs the role of setting the unique identifier, name, and description of the search parameter.To adjust search settings
configSpecify search settings using parameters.Search settings
configOf the dictionary passed to the parameterconfigurableStored in the key.Search settings are passed along with search queries, dynamically adjusted according to search queries.
Copy
Below is an example with dynamic search settings.
Copy
Copy
Copy
Copy
Copy
Copy
Query & Passage embedding model separated, such as Upstage embedding
The default retriever uses the same embedding model for queries and documents.
However, there are cases where different embedding models are used for queries and documents.
In these cases, the query is embedded using the query embedding model, and the document is embedded using the document embedding model.
This allows you to use different embedding models for queries and documents.
Copy
Below is an example of creating an Upstage embedding for queries and converting query sentences to vectors to perform vector similarity searches.
Copy
Copy
Last updated