LlamaParse is a document parsing service developed by LlamaIndex, specially designed for large language models (LLM). The main features are:
Support for various document formats including PDF, Word, PowerPoint, Excel, etc.
Provide custom output format via natural language instruction
Complex table and image extraction function
JSON mode support
Foreign language support
LlamaParse is available as a standalone API and is also available as part of the LlamaCloud platform. The service aims to improve the performance of LLM-based applications such as Search Enhancement Generation (RAG) by parsing and refining documents.
Users can process 1,000 pages per day for free, and additional capacity can be obtained through a paid plan. LlamaParse is currently available in public beta, and its functionality is constantly expanding.
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
# Parser settings
parser = LlamaParse(
result_type="markdown", # "markdown"과 "text" Available
num_workers=8, # worker 수 (Default: 4)
verbose=True,
language="ko",
)
# SimpleDirectoryReader Parsing files using
file_extractor = {".pdf": parser}
# LlamaParse Parsing files with
documents = SimpleDirectoryReader(
input_files=["data/SPRI_AI_Brief_December 2023 issue_F.pdf"],
file_extractor=file_extractor,
).load_data()
Started parsing the file under job_id 6a2aa79c-0d8b-4d59-866d-afa368baa31d
# Check page count
len(documents)
23
# Convert to Langchain document
docs = [doc.to_langchain_format() for doc in documents]
# metadata output of power
docs[0].metadata
{'file_path':'data/SPRI_AI_Brief_2023 December issue_F.pdf','file_name':'SPRI_AI_Brief_2023 December issue_F.pdf','file_type':'application/pdf', 'file_size'
Started parsing the file under job_id cf2876e9-02c2-4277-ae92-03ae21d4a3bd
# langchain Convert to document
docs = [doc.to_langchain_format() for doc in parsed_docs]
# parsing instruction Specifies.
parsing_instruction = (
"You are parsing a brief of AI Report. Please extract tables in markdown format."
)
# LlamaParse setting
parser = LlamaParse(
use_vendor_multimodal_model=True,
vendor_multimodal_model_name="openai-gpt4o",
vendor_multimodal_api_key=os.environ["OPENAI_API_KEY"],
result_type="markdown",
language="ko",
parsing_instruction=parsing_instruction,
)
# parsing The result was
parsed_docs = parser.load_data(file_path="data/SPRI_AI_Brief_2023년12월호_F.pdf")
# langchain Convert to document
docs = [doc.to_langchain_format() for doc in parsed_docs]
Started parsing the file under job_id afdbf3ba-61f6-4c14-8d41-9e986950b612 .
# markdown Check the table extracted in format
print(docs[-2].page_content)
# Ⅱ. Main event schedule
| Event name | Main overview of the event |
| --- | --- |
| CES 2024 | -The world's largest consumer electronics,IT, and consumer goods exhibition hosted by the American Society of Consumer Electronics (CTA), with companies exhibiting the latest technology products around major categories including 5G, AR&VR, digital health, transportation and mobility.
-Chairman CTA Sapiro has AI as the most notable sector, and in the sense of including all industries, this exhibition on the theme of'All InAI on' will host more than 500 Korean companies.
![CES 2024] (https://www.ces.tech/) |
| Period | 2024.1.9~12 |
| Place | USA, Las Vegas |
| Homepage | [https://www.ces.tech/](https://www.ces.tech/) |
| Event name | Main overview of the event |
| --- | --- |
| AIMLA 2024 | -International conference on machine learning and application (AIMLA 2024) shares knowledge and latest research results on the theory, methodology and practical approach of artificial intelligence and machine learning
-In terms of theory and practice, we discuss the main areas of artificial intelligence and mechanical learning, and together, share the cutting-edge development news in the field with researchers and practitioners in industry.
![AIMLA 2024] (https://ccnet2024.org/aimla/index) |
| Period | 2024.1.27~28 |
| Place | Denmark, Copenhagen |
| Homepage | [https://ccnet2024.org/aimla/index](https://ccnet2024.org/aimla/index) |
| Event name | Main overview of the event |
| --- | --- |
| AAAI Conference on Artificial Intelligence | - AI Development Association Conference (AAAI) promotes AI research and provides opportunities for exchange between AI researchers, practitioners, scientists, academics and engineers
At conferences, AI-related technical presentations, special tracks, guest speakers, workshops, tutorials, poster sessions, topic presentations, competitions, exhibition programs, etc.
![AAAI Conference on Artificial Intelligence] (https://aaai.org/aaai-conference/) |
| Period | 2024.2.20~27 |
| Place | Canada, Vancouver |
| Homepage | [https://aaai.org/aaai-conference/](https://aaai.org/aaai-conference/) |