01. OpenAIEmbeddings

Document embedding is the process of converting the content of a document into numerical vectors.

This process allows you to quantify the meaning of documents and utilize them for various natural language processing tasks. Representative pre-learned language models include BERT and GPT, and these models capture contextual information to encode the meaning of documents.

Document embedding creates an embedding vector by entering the tokenized document into the model, averaging it to generate a vector for the entire document. This vector can be used for document classification, emotional analysis, and calculation of similarity between documents.

Learn more

Settings

First install langchain-openai and set the required environment variables.

Copy

# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv

# API Load key information
load_dotenv()

Copy

True

List of supported models

MODEL

PAGES PER DOLLAR

PERFORMANCE ON MTEB EVAL

MAX INPUT

text-embedding-3-small

62,500

62.3%

8191

text-embedding-3-large

9,615

64.6%

8191

text-embedding-ada-002

12,500

61.0%

8191

Copy

Copy

Query embedding

embeddings.embed_query(text) Is a function that converts a given text into an embedding vector.

This function can be used to map text to a vector space to find semantically similar text or to calculate similarities between texts.

Copy

query_result[:5] has query_result Select by slicing the first 5 elements of the list.

Copy

Copy

Document embedding

embeddings.embed_documents() Embed text documents using functions.

  • [text] Pass a single document to the embedding function in the form of a list by passing it to the factor.

  • Embedding vector returned as result of function call doc_result Assign to variable.

Copy

doc_result[0][:5] has doc_result Slice and select the first 5 characters from the first element of the list.

Copy

Copy

Dimension assignment

text-embedding-3 Model classes allow you to specify the size of the embedding returned.

For example, basically text-embedding-3-small Returns 1536-dimensional embedding.

Copy

Copy

Dimensions adjustment

But dimensions=1024 By passing, the size of the embedding can be reduced to 1024.

Copy

Copy

Similarity calculation

Copy

Copy

Copy

Copy

Last updated