06. Word

Microsoft Word

Microsoft Word is a word processor developed by Microsoft.

This covers how to load a Word document into a document format that can be used downstream.

Docx2txtLoader

You can use docx2txt to import .docx files into documents.

Copy

# installation
# !pip install -qU docx2txt

Copy

from langchain_community.document_loaders import Docx2txtLoader

loader = Docx2txtLoader("./data/sample-word-document.docx")  # Initialize document loader

docs = loader.load()  # loading documents

print(len(docs))

Copy

1

UnstructuredWordDocumentLoader

Copy

Copy

The result is loaded as a single Document.

Copy

Copy

Internally, amorphism creates different “elements” for each chunk of text.

By default these are combined together, but can be easily separated by specifying mode="elements" .

Copy

Copy

Copy

Copy

Copy

Copy

Last updated