01. Character text split (CharacterTextSplitter)

CharacterTextSplitter

This is the simplest way.

Basically "\n\n" Split text in character units based on, and measure the size of chunks by number of characters.

  1. Text splitting method: single character basis

  2. Chunk size measurement method: based on number of characters

Copy

%pip install -qU langchain-text-splitters
  • ./data/appendix-keywords.txt Open the file and read the content.

  • Read file Save to variable.

Copy

# data/appendix-keywords.txt Opens a file and creates a file object called f.
with open("./data/appendix-keywords.txt") as f:
    file = f.read()  # Read the contents of the file Store it in a variable.

Outputs some of the contents of the file read from the file.

Copy

# Prints some of the content read from the file.Specifies the delimiter to use when splitting text. The default is "\n\n".
print(file[:500])

Copy

Code that divides text into chunks using CharacterTextSplitter.

  • separator Set the criteria to split into parameters. Default value "\n\n" is.

  • chunk_size Set the parameter to 250 to limit the maximum size of each chunk to 250 characters.

  • chunk_overlap Set the parameter to 50, allowing 50 characters to overlap between adjacent chunks.

  • length_function Specifies a function that calculates the length of a text by setting the parameter to len.

  • is_separator_regex Set the parameter to False to process the separator as a normal string rather than a regular expression.

Copy

  • text_splitter Using file Split text into document units.

  • The first document in a split document list ( texts[0] ).

Copy

Copy

Here is an example of passing a metadata along with a document.

Notice that the metadata is split with the document.

  • create_documents The method receives text data and metadata list as factors.

Copy

Copy

split_text() Split text using methods.

  • text_splitter.split_text(file)[0] silver file text text_splitter After splitting using, it returns the first element of the split text fragment.

Copy

Copy

Here is an example of passing a metadata along with a document.

Notice that the metadata is split with the document.

  • create_documents The method receives text data and metadata list as factors.

Copy

Copy

split_text() Split text using methods.

  • text_splitter.split_text(file)[0] silver file text text_splitter After splitting using, it returns the first element of the split text fragment.

Copy

Copy

Last updated