01. OpenAIEmbeddings
Document embedding is the process of converting the content of a document into numerical vectors.
This process allows you to quantify the meaning of documents and utilize them for various natural language processing tasks. Representative pre-learned language models include BERT and GPT, and these models capture contextual information to encode the meaning of documents.
Document embedding creates an embedding vector by entering the tokenized document into the model, averaging it to generate a vector for the entire document. This vector can be used for document classification, emotional analysis, and calculation of similarity between documents.
Settings
First install langchain-openai and set the required environment variables.
Copy
# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv
# API Load key information
load_dotenv()Copy
TrueList of supported models
MODEL
PAGES PER DOLLAR
PERFORMANCE ON MTEB EVAL
MAX INPUT
text-embedding-3-small
62,500
62.3%
8191
text-embedding-3-large
9,615
64.6%
8191
text-embedding-ada-002
12,500
61.0%
8191
Copy
Copy
Query embedding
embeddings.embed_query(text) Is a function that converts a given text into an embedding vector.
This function can be used to map text to a vector space to find semantically similar text or to calculate similarities between texts.
Copy
query_result[:5] has query_result Select by slicing the first 5 elements of the list.
Copy
Copy
Document embedding
embeddings.embed_documents() Embed text documents using functions.
[text]Pass a single document to the embedding function in the form of a list by passing it to the factor.Embedding vector returned as result of function call
doc_resultAssign to variable.
Copy
doc_result[0][:5] has doc_result Slice and select the first 5 characters from the first element of the list.
Copy
Copy
Dimension assignment
text-embedding-3 Model classes allow you to specify the size of the embedding returned.
For example, basically text-embedding-3-small Returns 1536-dimensional embedding.
Copy
Copy
Dimensions adjustment
But dimensions=1024 By passing, the size of the embedding can be reduced to 1024.
Copy
Copy
Similarity calculation
Copy
Copy
Copy
Copy
Last updated