08. Regressive JSON split (RecursiveJsonSplitter)

RecursiveJsonSplitter

This JSON divider creates a smaller JSON chunk by deep-first traversal of JSON data.

This divider attempts to keep nested JSON objects as much as possible, but splits objects if necessary to keep the chunk size between min_chunk_size and max_chunk_size. If the value is a very large string, not a nested JSON, that string is not split.

If you need strict restrictions on the size of the chunk, you can consider using the Recursive Text Splitter after this divider to handle that chunk.

Split criteria

  1. Text splitting method: based on JSON value

  2. Chunk size measurement method: based on number of characters

Copy

%pip install -qU langchain-text-splitters
  • requests.get() Use the function to get JSON data from the "https://api.smith.langchain.com/openapi.json" URL.

  • Imported JSON data json() Converted to Python dictionary form via method json_data Stored in variables.

Copy

import requests

# JSON Load the data.
json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()

RecursiveJsonSplitter An example of splitting JSON data using.

Copy

splitter.split_json() Split JSON data recursively using functions.

Copy

  • splitter.create_documents() Convert JSON data to document format using methods.

  • splitter.split_text() Split JSON data into string list using methods.

Copy

Copy

texts[2] After reviewing one of the large chunks by outputting, you can see that the chunk contains a list object.

  • There is a reason why the size of the second chunk exceeds the limit 300, which is a list object.

  • This is RecursiveJsonSplitter end Because the list object does not split is.

Copy

Copy

2 index chunks as follows json You can parse using modules.

Copy

Copy

convert_lists parameter True Rest within JSON by setting it to index:item Form key:value You can convert it into pairs.

Copy

Copy

Copy

docs You can check the documents corresponding to the specific index of the list.

Copy

Copy

Last updated