CH08 Embedding
Embedding is the third stage of the Retrieval-Augmented Generation (RAG) system, which is the document units created during the document division phase. Convert to numerical forms that machines can understand It is a process to do. This step is one of the key parts of the RAG system, by expressing the meaning of the document in the form of a vector (array of numbers), stored in DB for the question (Query) entered by the user Search for document fragments/paragraphs (Chunk) When importing Utilization when calculating similarity can be.
The need for embedding
Understanding meaning : Natural language is very complex and has a variety of meanings. By transforming these texts into quantified forms through embedding, computers can better understand and process the content and meaning of documents.
Information search enhancement : Conversion to numerical vector forms is essential for calculating similarity between documents. This facilitates searching for related documents or finding the document that best suits your question.
example
Embedding: Change sentence to numerical expression?

Paragraph 1: [0.1, 0.5, 0.9, ..., 0.1, 0.2]
Paragraph 2: [0.7, 0.1, 0.3, ..., 0.5, 0.6]
Paragraph 3: [0.9, 0.4, 0.5, ..., 0.4, 0.3]
Question: "What is the average annual growth rate of the AI software market predicted by market inspector IDC?"
[0.1, 0.5, 0.9, ..., 0.2, 0.4]
Similarity calculation example
1 time: 80% -> Select!
Number 2: 30%
Number 3: 25%
code
Copy
Reference
Last updated