08. Regressive JSON split (RecursiveJsonSplitter)
RecursiveJsonSplitter
This JSON divider creates a smaller JSON chunk by deep-first traversal of JSON data.
This divider attempts to keep nested JSON objects as much as possible, but splits objects if necessary to keep the chunk size between min_chunk_size and max_chunk_size. If the value is a very large string, not a nested JSON, that string is not split.
If you need strict restrictions on the size of the chunk, you can consider using the Recursive Text Splitter after this divider to handle that chunk.
Split criteria
Text splitting method: based on JSON value
Chunk size measurement method: based on number of characters
Copy
%pip install -qU langchain-text-splittersrequests.get()Use the function to get JSON data from the "https://api.smith.langchain.com/openapi.json" URL.Imported JSON data
json()Converted to Python dictionary form via methodjson_dataStored in variables.
Copy
import requests
# JSON Load the data.
json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()RecursiveJsonSplitter An example of splitting JSON data using.
Copy
splitter.split_json() Split JSON data recursively using functions.
Copy
splitter.create_documents()Convert JSON data to document format using methods.splitter.split_text()Split JSON data into string list using methods.
Copy
Copy
texts[2] After reviewing one of the large chunks by outputting, you can see that the chunk contains a list object.
There is a reason why the size of the second chunk exceeds the limit 300, which is a list object.
This is
RecursiveJsonSplitterend Because the list object does not split is.
Copy
Copy
2 index chunks as follows json You can parse using modules.
Copy
Copy
convert_lists parameter True Rest within JSON by setting it to index:item Form key:value You can convert it into pairs.
Copy
Copy
Copy
docs You can check the documents corresponding to the specific index of the list.
Copy
Copy
Last updated