13. LlamaParser

LlamaParse is a document parsing service developed by LlamaIndex, specially designed for large language models (LLM). The main features are:

  • Support for various document formats including PDF, Word, PowerPoint, Excel, etc.

  • Provide custom output format via natural language instruction

  • Complex table and image extraction function

  • JSON mode support

  • Foreign language support

LlamaParse is available as a standalone API and is also available as part of the LlamaCloud platform. The service aims to improve the performance of LLM-based applications such as Search Enhancement Generation (RAG) by parsing and refining documents.

Users can process 1,000 pages per day for free, and additional capacity can be obtained through a paid plan. LlamaParse is currently available in public beta, and its functionality is constantly expanding.

API key setting -After issuing API key .env To file LLAMA_CLOUD_API_KEY Set on.

Copy

# INSTALLATION
# !pip install llama-index-core llama-parse llama-index-readers-file python-dotenv

Copy

import os
import nest_asyncio
from dotenv import load_dotenv

load_dotenv()
nest_asyncio.apply()

Basic parser application

Copy

Copy

Copy

Copy

LlamaIndex -> Convert to LangChain Document

Copy

Copy

Copy

MultiModal Model as Parsing

Main parameters

  • use_vendor_multimodal_model : Specifies whether to use a multi-modal model. True When set to, it uses a multi-modal model of the external vendor.

  • vendor_multimodal_model_name : Specifies the name of the multi-modal model to use. I am using "openai-gpt4o" here.

  • vendor_multimodal_api_key : Specifies the multi-modal model API key. Get the OpenAI API key from the environment variable.

  • result_type : Specifies the format of the parsing result. Set to "markdown", the result is returned in the markdown format.

  • language : Specifies the language of the document to be parsed. It is set to "en" and processed in Korean.

  • skip_diagonal_text : Decide whether to skip diagonal text.

  • page_separator : You can specify the page delimiter.

Copy

Copy

Copy

Copy

It is also possible to specify a custom instrument as shown below.

Copy

Copy

Copy

Copy

Last updated