06. GPT4ALL embedding

GPT4All Is a chatbot that takes into account the protection of local execution-based personal information that is free to use.

It does not require a GPU or internet connection, it offers popular models such as GPT4All Falcon, Wizard, and its own.

With LangChain on this laptop GPT4All embeddings Describes how to use.

GPT4All's Python binding installation

Run the following command to install GPT4All's Python binding

Copy

%pip install --upgrade --quiet  gpt4all > /dev/null

GPT4AllEmbeddings Class langchain_community.embeddings The module impotes.

GPT4AllEmbeddings Is a class that provides the ability to embed text data into vectors using the GPT4All model. This class implements the embedding interface of the LangChain framework, which can be used with various features of LangChain.

Copy

from langchain_community.embeddings import GPT4AllEmbeddings

GPT4All uses a CPU-optimized contrast learning sentence converter to support the creation of high-quality embedding for text documents of any length. This embedding is of similar quality in many tasks using OpenAI.

GPT4AllEmbeddings Generate an instance of a class.

GPT4AllEmbeddings Is an embedding model that converts text data into vectors using the GPT4All model.
In this code gpt4all_embd To variable GPT4AllEmbeddings Assign instances.
after gpt4all_embd You can use to convert text data to vectors.

Copy

gpt4all_embd = GPT4AllEmbeddings()  # GPT4All Create an embedding object.

Copy

 100%|██████████| 45.9M/45.9M [00:05<00:00, 8.75MiB/s]

Copy

bert_load_from_file: gguf version = 2 
bert_load_from_file: gguf alignment = 32 
bert_load_from_file: gguf data offset = 695552 
bert_load_from_file: model name = BERT 
bert_load_from_file: model architecture = bert 
bert_load_from_file: model file type = 1 
bert_load_from_file: bert tokenizer vocab = 30522

Copy

gpt4all_embd = GPT4AllEmbeddings()  # GPT4All Create an embedding object.

text "It is a sample sentence to test for embedding" in the variable. R assigns a string.

Copy

text = (
    "Here are some sample sentences for embedding tests." # Define the document text for testing.
)

Embed the Textual Data

The process of embedding text data is as follows.

First, it tokens the text data and converts it to numeric forms.

At this time, the pre-learned talkizer is used to separate the text into token units, and each token is mapped to a unique integer.

Next, the tokenized data is entered into the embedding layer to convert it into a high-dimensional, dense vector form.

In this process, each token is represented by a vector of real values that capture the meaning and context of that token.

Finally, embedded vectors can be used for various natural language processing tasks.

For example, it can be used as input data in tasks such as document classification, emotional analysis, machine translation, etc. to improve the performance of the model.

This text data embedding process plays a very important role in natural language processing and is essential for effectively processing and analyzing large amounts of text data.

gpt4all_embd Object embed_query Text given using methods ( text ) Is embedding.

text The text to be embedded in the variable is stored.
gpt4all_embd Objects are objects that perform text embedding using the GPT4All model.
embed_query The method converts the given text into vector form and returns it.
Embedding results query_result Stored in variables.

Copy

query_result = gpt4all_embd.embed_query(
    text
)  # Generates query embeddings for given text.

The embed_documents function allows you to embed multiple pieces of text.

You can also check the visual representation of your data by mapping these embeddings to Atomic's Atlas (https://docs.nomic.ai/index.html)와).

Check the size of the embedded dimension.

Copy

# Check the size of the embedded dimension.
len(query_result)

Copy

gpt4all_embd Object embed_documents Using methods text Embed documents.

text Wrapped the document with a list embed_documents Pass by method factor.
embed_documents The method calculates and returns the embedding vector of the document.
The embedding vector returned doc_result Stored in variables.

Copy

# Generate document vectors by embedding the given text.
doc_result = gpt4all_embd.embed_documents([text])

Copy

# Check the size of the embedded dimension.
len(doc_result[0])

Copy

Previous05. OllamaEmbeddings Next07. Llama CPP embedding

Last updated 5 months ago