01. Structure of Document

Document & Document Loaders

Reference

Documents utilized for practice

Software Policy Institute (SPRi)-December 2023

  • Author: Jaeheung Lee (AI Policy Institute Office Liability Institute), Lee Ji-soo (AI Policy Lab Yi Phyang Institute)

  • File name: SPRI_AI_Brief_December 2023 issue_F.pdf

Document

This is the basic document object of LangChain.

property - page_content : A string representing the content of the document. - metadata : A dictionary representing the document's metadata.

Copy

from langchain_core.documents import Document

document = Document("Hello, this is Langchain's document.")

Copy

Copy

Add properties to metadata

Copy

Copy

Copy

Document Loader

It serves to convert content from various file formats to Document objects.

Main Loader

  • PyPDFLoader: A loader that loads PDF files.

  • CSVLoader: A loader that loads CSV files.

  • UnstructuredHTMLLoader: A loader that loads HTML files.

  • JSONLoader: A loader that loads JSON files.

  • TextLoader: A loader that loads text files.

  • DirectoryLoader: A loader that loads directories.

Copy

Copy

load()

  • Load and return documents.

  • Returned results List[Document] Form.

Copy

Copy

Copy

load_and_split()

  • Split and return documents using splitter.

  • Returned results List[Document] Form.

Copy

Copy

lazy_load()

  • Load documents in a generator way.

Copy

Copy

aload()

  • Loading documents in asynchronous (Async)

Copy

Copy

Copy

Previous06. Human-in-the-loop (human intervention)Next02. PDF

Last updated 4 months ago

Last updated