01. Structure of Document
Document & Document Loaders
Reference
Documents utilized for practice
Software Policy Institute (SPRi)-December 2023
Author: Jaeheung Lee (AI Policy Institute Office Liability Institute), Lee Ji-soo (AI Policy Lab Yi Phyang Institute)
File name:
SPRI_AI_Brief_December 2023 issue_F.pdf
Document
This is the basic document object of LangChain.
property - page_content : A string representing the content of the document. - metadata : A dictionary representing the document's metadata.
Copy
from langchain_core.documents import Document
document = Document("Hello, this is Langchain's document.")Copy
Copy
Add properties to metadata
Copy
Copy
Copy
Document Loader
It serves to convert content from various file formats to Document objects.
Main Loader
PyPDFLoader: A loader that loads PDF files.
CSVLoader: A loader that loads CSV files.
UnstructuredHTMLLoader: A loader that loads HTML files.
JSONLoader: A loader that loads JSON files.
TextLoader: A loader that loads text files.
DirectoryLoader: A loader that loads directories.
Copy
Copy
load()
Load and return documents.
Returned results
List[Document]Form.
Copy
Copy
Copy
load_and_split()
Split and return documents using splitter.
Returned results
List[Document]Form.
Copy
Copy
lazy_load()
Load documents in a generator way.
Copy
Copy
aload()
Loading documents in asynchronous (Async)
Copy
Copy
Copy
Previous06. Human-in-the-loop (human intervention)Next02. PDF
Last updated 4 months ago
Last updated