09. Time Weighted Vector StoreRetriever
TimeWeightedVectorStoreRetriever Is a search tool that combines semantic similarity with attenuation over time. Through this, the document or data "freshness" and "relevance" All of them are considered and provide results.
The scoring algorithm consists of:
semantic_similarity+(1.0−decay_rate)hourspassed
here semantic_similarity Indicates the semantic similarity between documents or data, decay_rate Is the percentage that indicates how much the score decreases over time. hours_passed means the time (in hours) that has elapsed since the object was last accessed.
The main feature of this approach is based on the time the object was last approached "The freshness of information" Is that it evaluates. In other words, Frequently approached objects score high over time To maintain, through this Frequently used or important information increases the likelihood that it will be located at the top of the search results. This method provides dynamic search results that take into account both the latest and relevant.
Especially, decay_rate not after the retriever's object was created Time elapsed since last access Means. In other words, frequently accessed objects remain'latest'.
Copy
# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv
# API Load key information
load_dotenv()Copy
True Copy
# LangSmith set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging
# Enter a project name.
logging.langsmith("CH11-Retriever")Copy
Low decay_rate
decay rateLow (I'll set it extremely close to zero here) "Remember" longer It means that it will.decay rateend 0 This is never forgotten Means to, which makes this retriever equal to the vector lookup.
TimeWeightedVectorStoreRetriever Initialize the vector reservoir, damping rate ( decay_rate ) To a very small value, and the number of vectors to search for (k) is 1.
Copy
Add simple example data.
Copy
Copy
retriever.invoke() Perform a search by calling.
This is because it is the most prominent (salient) document.
decay_rateend Because it is close to zero The document in is still considered the latest (recent).
Copy
Copy
High decay_rate
High decay_rate (E.g. 0.9999...)Using recency score It converges to zero quickly.
(If you set this value to 1, recency The value is 0, and you get the same result as Vector Lookup.)
TimeWeightedVectorStoreRetriever Use to initialize the searcher. decay_rate Adjust the weight reduction rate over time by setting 0.999.
Copy
Add a new document again.
Copy
Copy
retriever.invoke("테디노트") When called ""테디노트 구독 해주실꺼죠? Please!"" Is returned first. -This is retriever's "Subscribe to the teddy note." This is because most of the documents related to have been forgotten.
Copy
Copy
Arrangement of damping rate (decay_rate)
decay_rateWhen set to 0.000001 very smallThe attenuation rate (i.e., the rate of oblivion of information) is very low, so I rarely forget the information.
therefore, There is little time weight difference, whether it's up-to-date or old. At this time, you will give a higher score for similarity.
decay_rateWhen set to 0.999, close to 1The attenuation rate (i.e., the rate of oblivion of information) is very high. Therefore, the information of the past is almost forgotten.
Therefore, these cases will give you a higher score for the latest information.
In virtual time decay_rate adjustment
decay_rate adjustmentSome utilities from LangChain allow you to mock (mock) time components.
mock_nowA function is a utility function provided by LangChain, used to mock the current time.
Copy
Copy
mock_now You can use functions to test your search results while changing the current time.
Take advantage of that feature
decay_rateYou can help find it.
[Caution] If you set it to a time that was too long ago, you may get an error when calculating decay_rate.
Copy
Copy
Last updated