05. Code splitting (Python, Markdown, JAVA, C++, C#, GO, JS, Latex, etc)
Split code
CodeTextSplitter allows you to split code written in various programming languages.
To do this Language Just import the enum and specify the corresponding programming language.
Copy
%pip install -qU langchain-text-splittersRecursiveCharacterTextSplitter This is an example of splitting text using.
langchain_text_splittersIn moduleLanguageWowRecursiveCharacterTextSplitterImport the class.RecursiveCharacterTextSplitterIs a text divider that recursively divides text into character units.
Copy
from langchain_text_splitters import (
Language,
RecursiveCharacterTextSplitter,
)Get a complete list of supported languages.
Copy
# Get a full list of supported languages
[e.value for e in Language]Copy
RecursiveCharacterTextSplitter Class get_separators_for_language You can use methods to identify the separators used in a particular language.
In example
Language.PYTHONPass the enumeration values to the factor to confirm the delimiter used in the Python language.
Copy
Copy
Python
RecursiveCharacterTextSplitter The examples used are:
RecursiveCharacterTextSplitterSplit Python code into document units using.languageIn parametersLanguage.PYTHONSpecify and use the Python language.chunk_sizeSet to 50 to limit the maximum size of each document.chunk_overlapSetting 0 does not allow duplication between documents.
Copy
Document Generate. Created Document is returned in list form.
Copy
Copy
Copy
JS
Here is an example using a JS text divider
Copy
Copy
TS
Here is an example using a TS text divider.
Copy
Copy
Markdown
Here is an example using a Markdown text divider.
Copy
It is an open source project in a rapidly developing field. Ministry of Mass 🙏
Copy
Copy
Latex
LaTeX is a markup language for writing documents, widely used to express mathematical symbols and formulas.
Here is an example of LaTeX text.
Copy
Split and output results.
Copy
Copy
HTML
Here is an example using an HTML text divider:
Copy
Split and output results.
Copy
Copy
Last updated