01. VectorStore-backed Retriever

VectorStore Support Finder is a retriever that searches for documents using the vector store.

Vector store Similarity search Ina MMR Query text within the vector store using the same search method.

Run the code below to generate VectorStore

Copy

# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv

# API Load key information
load_dotenv()

Copy

True

Copy

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH11-Retriever")

Copy

Copy

VectorStoreRetriever initialization at VectorStore (as_retriever)

as_retriever The method initializes and returns VectorStoreRetriever based on the VectorStore object. This method allows you to set up various search options to perform document searches tailored to your needs.

Parameters

  • **kwargs : Keyword factor to pass to search function

  • search_type : Search type ("similarity", "mmr", "similarity_score_threshold")

  • search_kwargs : Additional search options

    • k : Number of documents to return (default: 4)

    • score_threshold : minimum similarity threshold for similarity_score_threshold search

    • fetch_k : Number of documents to pass to MMR algorithm (default: 20)

    • lambda_mult : Diversity regulation of MMR results (between 0-1, default: 0.5)

    • filter : Document metadata based filtering

Return value

  • VectorStoreRetriever : Initialized VectorStoreRetriever object

Reference

  • Various search strategies can be implemented (similarity, MMR, threshold based)

  • MMR (Maximal Marginal Relevance) algorithm allows you to regulate the diversity of search results

  • Metadata filtering allows only documents with specific conditions to be retrieved

  • tags Tagging can be added to the searcher via parameters

caution

  • search_type and search_kwargs Proper combination required

  • When using MMR fetch_k Wow k Need to balance values

  • score_threshold Values that are too high at the time of setting may not have search results

  • When using the filter, it is necessary to pinpoint the metadata structure of the dataset.

  • lambda_mult The closer the value is to 0, the higher the diversity, the closer to 1, the higher the similarity.

Copy

Retriever invoke( )

invoke The method is Retriever's main entry point, used to retrieve related documents. This method synchronously calls Retriever to return relevant documents for a given query.

Parameters

  • input : Search query string

  • config : Retriever configuration (Optional[RunnableConfig])

  • **kwargs : Additional factors to pass to Retriever

Return value

  • List[Document] : List of related documents

Copy

Copy

Max Marginal Relevance (MMR)

MMR(Maximal Marginal Relevance) The way the documents retrieved when searching for related items for queries Duplicate This is one way to avoid.

Instead of simply searching for only the most relevant items, MMR is about queries Document relevance And already selected simultaneously consider discrimination against documents To.

  • search_type parameter "mmr" By setting MMR (Maximal Marginal Relevance) Use search algorithms.

  • k : Number of documents to return (default: 4)

  • fetch_k : Number of documents to pass to MMR algorithm (default: 20)

  • lambda_mult : Diversity control of MMR results (0~1, default: 0.5, 0: Similarity score only, 1: Diversity only)

Copy

Copy

Similarity score threshold search (similarity_score_threshold)

You can set a similarity score threshold and set a search method that returns only documents with points above that threshold.

By setting the threshold appropriately Filter less relevant documents Do, Screening only the most similar documents You can. - search_type parameter "similarity_score_threshold" Set to perform a search based on the similarity score threshold.

  • search_kwargs In parameters {"score_threshold": 0.8} Pass the similarity score threshold to 0.8. This is the search result Only documents with a similarity score of 0.8 or higher are returned Means.

Copy

Copy

top_k setting

Use when searching k You can specify search keyword factors (kwargs) like this.

k The parameter represents the number of parent results to return from the search results. - search_kwargs in k Set the parameter to 1 to specify the number of documents to return as search results.

Copy

Copy

Dynamic settings (Configurable)

  • To dynamically adjust search settings ConfigurableField Use.

  • ConfigurableField Is the role of setting the unique identifier, name, and description of the search parameter.

  • To adjust search settings config Specify search settings using parameters.

  • Search settings config Of the dictionary passed to the parameter configurable Stored in the key.

  • Search settings are passed along with search queries, dynamically adjusted according to search queries.

Copy

Below is an example with dynamic search settings.

Copy

Copy

Copy

Copy

Copy

Copy

Query & Passage embedding model separated, such as Upstage embedding

The default retriever uses the same embedding model for queries and documents.

However, there are cases where different embedding models are used for queries and documents.

In these cases, the query is embedded using the query embedding model, and the document is embedded using the document embedding model.

This allows you to use different embedding models for queries and documents.

Copy

Below is an example of creating an Upstage embedding for queries and converting query sentences to vectors to perform vector similarity searches.

Copy

Copy

Last updated