01. Character text split (CharacterTextSplitter)
CharacterTextSplitter
This is the simplest way.
Basically "\n\n" Split text in character units based on, and measure the size of chunks by number of characters.
Text splitting method: single character basis
Chunk size measurement method: based on number of characters
Copy
%pip install -qU langchain-text-splitters./data/appendix-keywords.txtOpen the file and read the content.Read
fileSave to variable.
Copy
# data/appendix-keywords.txt Opens a file and creates a file object called f.
with open("./data/appendix-keywords.txt") as f:
file = f.read() # Read the contents of the file Store it in a variable.Outputs some of the contents of the file read from the file.
Copy
# Prints some of the content read from the file.Specifies the delimiter to use when splitting text. The default is "\n\n".
print(file[:500])Copy
Code that divides text into chunks using CharacterTextSplitter.
separatorSet the criteria to split into parameters. Default value"\n\n"is.chunk_sizeSet the parameter to 250 to limit the maximum size of each chunk to 250 characters.chunk_overlapSet the parameter to 50, allowing 50 characters to overlap between adjacent chunks.length_functionSpecifies a function that calculates the length of a text by setting the parameter to len.is_separator_regexSet the parameter to False to process the separator as a normal string rather than a regular expression.
Copy
text_splitterUsingfileSplit text into document units.The first document in a split document list (
texts[0]).
Copy
Copy
Here is an example of passing a metadata along with a document.
Notice that the metadata is split with the document.
create_documentsThe method receives text data and metadata list as factors.
Copy
Copy
split_text() Split text using methods.
text_splitter.split_text(file)[0]silverfiletexttext_splitterAfter splitting using, it returns the first element of the split text fragment.
Copy
Copy
Here is an example of passing a metadata along with a document.
Notice that the metadata is split with the document.
create_documentsThe method receives text data and metadata list as factors.
Copy
Copy
split_text() Split text using methods.
text_splitter.split_text(file)[0]silverfiletexttext_splitterAfter splitting using, it returns the first element of the split text fragment.
Copy
Copy
Last updated