This JSON divider creates a smaller JSON chunk by deep-first traversal of JSON data.
This divider attempts to keep nested JSON objects as much as possible, but splits objects if necessary to keep the chunk size between min_chunk_size and max_chunk_size. If the value is a very large string, not a nested JSON, that string is not split.
If you need strict restrictions on the size of the chunk, you can consider using the Recursive Text Splitter after this divider to handle that chunk.
Split criteria
Text splitting method: based on JSON value
Chunk size measurement method: based on number of characters
Copy
%pip install -qU langchain-text-splitters
requests.get() Use the function to get JSON data from the "https://api.smith.langchain.com/openapi.json" URL.
Imported JSON data json() Converted to Python dictionary form via method json_data Stored in variables.
Copy
import requests
# JSON Load the data.
json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()
RecursiveJsonSplitter An example of splitting JSON data using.
Copy
splitter.split_json() Split JSON data recursively using functions.
Copy
splitter.create_documents() Convert JSON data to document format using methods.
splitter.split_text() Split JSON data into string list using methods.
Copy
Copy
texts[2] After reviewing one of the large chunks by outputting, you can see that the chunk contains a list object.
There is a reason why the size of the second chunk exceeds the limit 300, which is a list object.
This is RecursiveJsonSplitter end Because the list object does not split is.
Copy
Copy
2 index chunks as follows json You can parse using modules.
Copy
Copy
convert_lists parameter True Rest within JSON by setting it to index:item Form key:value You can convert it into pairs.
Copy
Copy
Copy
docs You can check the documents corresponding to the specific index of the list.
from langchain_text_splitters import RecursiveJsonSplitter
# JSON Splits data into chunks of up to 300 size. RecursiveJsonSplitter Create an object.
splitter = RecursiveJsonSplitter(max_chunk_size=300)
# JSON Splits data recursively. Use when you need to access or manipulate small pieces of JSON.
json_chunks = splitter.split_json(json_data=json_data)
# JSON Generate documents based on data.
docs = splitter.create_documents(texts=[json_data])
# JSON Generates string chunks based on data.
texts = splitter.split_text(json_data=json_data)
# Prints the first string.
print(docs[0].page_content)
print("===" * 20)
# Outputs chunks of a split string.
print(texts[0])
# Let's check the size of the chunk.
print([len(text) for text in texts][:10])
# If we examine one of the larger chunks, we can see that it contains a list object.
print(texts[1])
# Next, we preprocess the JSON and convert the list into a dictionary with key:value pairs as index:items.
texts = splitter.split_text(json_data=json_data, convert_lists=True)
# The list is converted to a dictionary and the result is checked.
print(texts[2])