03. Hangeul (HWP)

HWP (Korean)

Hangeul (HWP) is a word processor developed by Hangeul and the computer, a representative document writing program in Korea.

It uses .hwp as a file extension, and is widely used by businesses, schools, and government agencies. That's why Korean developers .hwp You may have had an experience with handling documents. (Or I will try)

Unfortunately, LangChain is not yet an integration, so I implemented it myself HWPLoader Should be used.

Copy

# installation
# !pip install -qU langchain-teddynote

Copy

from langchain_teddynote.document_loaders import HWPLoader

# HWP Loader object creation
loader = HWPLoader("./data/Digital Government Innovation Promotion Plan.hwp")

# load document
docs = loader.load()

Copy

# Output the results
print(docs[0].page_content[:1000])

Copy

metadata contains file name information.

Copy

Copy

Last updated