13. LlamaParser
LlamaParse is a document parsing service developed by LlamaIndex, specially designed for large language models (LLM). The main features are:
Support for various document formats including PDF, Word, PowerPoint, Excel, etc.
Provide custom output format via natural language instruction
Complex table and image extraction function
JSON mode support
Foreign language support
LlamaParse is available as a standalone API and is also available as part of the LlamaCloud platform. The service aims to improve the performance of LLM-based applications such as Search Enhancement Generation (RAG) by parsing and refining documents.
Users can process 1,000 pages per day for free, and additional capacity can be obtained through a paid plan. LlamaParse is currently available in public beta, and its functionality is constantly expanding.
API key setting -After issuing API key .env To file LLAMA_CLOUD_API_KEY Set on.
Copy
# INSTALLATION
# !pip install llama-index-core llama-parse llama-index-readers-file python-dotenvCopy
import os
import nest_asyncio
from dotenv import load_dotenv
load_dotenv()
nest_asyncio.apply()Basic parser application
Copy
Copy
Copy
Copy
LlamaIndex -> Convert to LangChain Document
Copy
Copy
Copy
MultiModal Model as Parsing
Main parameters
use_vendor_multimodal_model: Specifies whether to use a multi-modal model.TrueWhen set to, it uses a multi-modal model of the external vendor.vendor_multimodal_model_name: Specifies the name of the multi-modal model to use. I am using "openai-gpt4o" here.vendor_multimodal_api_key: Specifies the multi-modal model API key. Get the OpenAI API key from the environment variable.result_type: Specifies the format of the parsing result. Set to "markdown", the result is returned in the markdown format.language: Specifies the language of the document to be parsed. It is set to "en" and processed in Korean.skip_diagonal_text: Decide whether to skip diagonal text.page_separator: You can specify the page delimiter.
Copy
Copy
Copy
Copy
It is also possible to specify a custom instrument as shown below.
Copy
Copy
Copy
Copy
Last updated