11. Arxiv

arXiv is an open-access archive of 2 million academic papers in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. API Documentation

To access the Arxiv document loader, you need to install the arxiv, PyMuPDF, and langchain-community integration packages.

PyMuPDF converts PDF files downloaded from the arxiv.org site to text format.

Copy

# installation
# !pip install -qU langchain-community arxiv pymupdf

Object creation

Copy

Now we can instantiate a model object and load the document:

Copy

from langchain_community.document_loaders import ArxivLoader

# Query Enter the topic of the paper you want to search for.
loader = ArxivLoader(
    query="Chain of thought",
    load_max_docs=2,  # Maximum number of documents
    load_all_available_meta=True,  # Whether to load full metadata
)

Copy

Copy

Copy

Copy

load_all_available_meta=False In this case, only part of the metadata is output, not all.

Copy

Copy

summation(summary)

  • If you want to print a summary rather than the full text of the paper, call the get_summaries_as_docs() function.

Copy

Copy

lazy_load()

When loading documents in bulk, if you can perform downstream operations on a subset of all loaded documents, you can lazy load the documents one at a time to minimize memory usage.

Copy

Copy

Copy

Last updated