06. Word
Microsoft Word
Microsoft Word is a word processor developed by Microsoft.
This covers how to load a Word document into a document format that can be used downstream.
Docx2txtLoader
You can use docx2txt to import .docx files into documents.
Copy
# installation
# !pip install -qU docx2txtCopy
from langchain_community.document_loaders import Docx2txtLoader
loader = Docx2txtLoader("./data/sample-word-document.docx") # Initialize document loader
docs = loader.load() # loading documents
print(len(docs))Copy
1UnstructuredWordDocumentLoader
Copy
Copy
The result is loaded as a single Document.
Copy
Copy
Internally, amorphism creates different “elements” for each chunk of text.
By default these are combined together, but can be easily separated by specifying mode="elements" .
Copy
Copy
Copy
Copy
Copy
Copy
Last updated