05. Code splitting (Python, Markdown, JAVA, C++, C#, GO, JS, Latex, etc)

Split code

CodeTextSplitter allows you to split code written in various programming languages.

To do this Language Just import the enum and specify the corresponding programming language.

Copy

%pip install -qU langchain-text-splitters

RecursiveCharacterTextSplitter This is an example of splitting text using.

  • langchain_text_splitters In module Language Wow RecursiveCharacterTextSplitter Import the class.

  • RecursiveCharacterTextSplitter Is a text divider that recursively divides text into character units.

Copy

from langchain_text_splitters import (
    Language,
    RecursiveCharacterTextSplitter,
)

Get a complete list of supported languages.

Copy

# Get a full list of supported languages
[e.value for e in Language]

Copy

RecursiveCharacterTextSplitter Class get_separators_for_language You can use methods to identify the separators used in a particular language.

  • In example Language.PYTHON Pass the enumeration values to the factor to confirm the delimiter used in the Python language.

Copy

Copy

Python

RecursiveCharacterTextSplitter The examples used are:

  • RecursiveCharacterTextSplitter Split Python code into document units using.

  • language In parameters Language.PYTHON Specify and use the Python language.

  • chunk_size Set to 50 to limit the maximum size of each document.

  • chunk_overlap Setting 0 does not allow duplication between documents.

Copy

Document Generate. Created Document is returned in list form.

Copy

Copy

Copy

JS

Here is an example using a JS text divider

Copy

Copy

TS

Here is an example using a TS text divider.

Copy

Copy

Markdown

Here is an example using a Markdown text divider.

Copy

It is an open source project in a rapidly developing field. Ministry of Mass 🙏

Copy

Copy

Latex

LaTeX is a markup language for writing documents, widely used to express mathematical symbols and formulas.

Here is an example of LaTeX text.

Copy

Split and output results.

Copy

Copy

HTML

Here is an example using an HTML text divider:

Copy

Split and output results.

Copy

Copy

Last updated