arXiv is an open-access archive of 2 million academic papers in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. API Documentation
To access the Arxiv document loader, you need to install the arxiv, PyMuPDF, and langchain-community integration packages.
PyMuPDF converts PDF files downloaded from the arxiv.org site to text format.
Now we can instantiate a model object and load the document:
Copy
from langchain_community.document_loaders import ArxivLoader
# Query Enter the topic of the paper you want to search for.
loader = ArxivLoader(
query="Chain of thought",
load_max_docs=2, # Maximum number of documents
load_all_available_meta=True, # Whether to load full metadata
)
Copy
Copy
Copy
Copy
load_all_available_meta=False In this case, only part of the metadata is output, not all.
Copy
Copy
summation(summary)
If you want to print a summary rather than the full text of the paper, call the get_summaries_as_docs() function.
Copy
Copy
lazy_load()
When loading documents in bulk, if you can perform downstream operations on a subset of all loaded documents, you can lazy load the documents one at a time to minimize memory usage.
# Document loading result output
docs = loader.load()
docs
[Document(metadata={'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning demonstrations, to guide the model to\nreason step-by-step while reducing reasoning mistakes. To improve\ngeneralization, we introduce an automatic method to construct contrastive\ndemonstrations. Our experiments on reasoning benchmarks demonstrate that\ncontrastive chain of thought can serve as a general enhancement of\nchain-of-thought prompting.', 'entry_id': 'http://arxiv.org/abs/2311.09277v1', 'published_first_time': '2023-11-15', 'comment': None, 'journal_ref': None, 'doi': None, 'primary_category': 'cs.CL', 'categories': ['cs.CL'], 'links': ['http://arxiv.org/abs/2311.09277v1', 'http://arxiv.org/pdf/2311.09277v1']}, page_content='Contrastive Chain-of-Thought Prompting\nYew Ken Chia∗1,\nDeCLaRe\nGuizhen Chen∗1, 2\nLuu Anh Tuan2\nSoujanya Poria\nDeCLaRe\nLidong Bing† 1\n1DAMO Academy, Alibaba Group, Singapore
...
(syncopation)
...
Least-to-most prompting enables com-\nplex reasoning in large language models. In The\nEleventh International Conference on Learning Rep-\nresentations.\n'),
Document(metadata={'Published': '2024-03-23', 'Title': 'Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models', 'Authors': 'Yao Yao, Zuchao Li, Hai Zhao', 'Summary': "With the widespread use of language models (LMs) in NLP tasks, researchers\nhave discovered the potential of Chain-of-thought (CoT) to assist LMs in\naccomplishing complex reasoning tasks by generating intermediate steps.\nHowever, human thought processes are often non-linear, rather than simply\nsequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT)\nreasoning, which models human thought processes not only as a chain but also as\na graph. By representing thought units as nodes and connections between them as\nedges, our approach captures the non-sequential nature of human thinking and\nallows for a more realistic modeling of thought processes. GoT adopts a\ntwo-stage framework with an additional GoT encoder for thought graph\nrepresentation and fuses the graph representation with the original input\nrepresentation through a gated fusion mechanism. We evaluate GoT's performance\non a text-only reasoning task (AQUA-RAT) and a multimodal reasoning task\n(ScienceQA). Our model achieves significant improvement over the strong CoT\nbaseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59%\nusing the T5-base model over the state-of-the-art Multimodal-CoT on the\nScienceQA test set.", 'entry_id': 'http://arxiv.org/abs/2305.16582v2', 'published_first_time': '2023-05-26', 'comment': None, 'journal_ref': None, 'doi': None, 'primary_category': 'cs.CL', 'categories': ['cs.CL'], 'links': ['http://arxiv.org/abs/2305.16582v2', 'http://arxiv.org/pdf/2305.16582v2']}, page_content='Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in\nLanguage Models\nYao Yao1,2, Zuchao Li3,∗and Hai Zhao1,2,∗
...
(syncopation)
...
The answer is (B)\n(D) mix\nwrong rationales wrong answer\nwrong rationales wrong answer\nFigure 11: Examples of ScienceQA\nthree objects\nhave in\ncommon\nobject\nhas\ndifferent properties\nput objects into\ngroups\na hard object\ncan attach to\nother things\nis\ncolor\nblue\n49.56\n44.00\nFigure 12: Representation visualization\n')]
# Output document metadata
docs[0].metadata
{'Published': '2023-11-15', 'Title': 'Contrastive Chain-of-Thought Prompting', 'Authors': 'Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing', 'Summary': 'Despite the success of chain of thought in enhancing language model\nreasoning, the underlying process remains less well understood. Although\nlogically sound reasoning appears inherently crucial for chain of thought,\nprior studies surprisingly reveal minimal impact when using invalid\ndemonstrations instead. Furthermore, the conventional chain of thought does not\ninform language models on what mistakes to avoid, which potentially leads to\nmore errors. Hence, inspired by how humans can learn from both positive and\nnegative examples, we propose contrastive chain of thought to enhance language\nmodel reasoning. Compared to the conventional chain of thought, our approach\nprovides both valid and invalid reasoning demonstrations, to guide the model to\nreason step-by-step while reducing reasoning mistakes. To improve\ngeneralization, we introduce an automatic method to construct contrastive\ndemonstrations. Our experiments on reasoning benchmarks demonstrate that\ncontrastive chain of thought can serve as a general enhancement of\nchain-of-thought prompting.', 'entry_id': 'http://arxiv.org/abs/2311.09277v1', 'published_first_time': '2023-11-15', 'comment': None, 'journal_ref': None, 'doi': None, 'primary_category': 'cs.CL', 'categories': ['cs.CL'], 'links': ['http://arxiv.org/abs/2311.09277v1', 'http://arxiv.org/pdf/2311.09277v1']}
# Query Enter the topic of the paper you want to search for.
loader = ArxivLoader(
query="ChatGPT",
load_max_docs=2, # Maximum number of documents
load_all_available_meta=False, # Whether to load full metadata
)
# Output the document load results
docs = loader.load()
# Output document metadata
docs[0].metadata
{'Published': '2023-05-23', 'Title': 'Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors', 'Authors': 'Ridong Han, Tao Peng, Chaohao Yang, Benyou Wang, Lu Liu, Xiang Wan', 'Summary': 'ChatGPT has stimulated the research boom in the field of large language\nmodels. In this paper, we assess the capabilities of ChatGPT from four\nperspectives including Performance, Evaluation Criteria, Robustness and Error\nTypes. Specifically, we first evaluate ChatGPT\'s performance on 17 datasets\nwith 14 IE sub-tasks under the zero-shot, few-shot and chain-of-thought\nscenarios, and find a huge performance gap between ChatGPT and SOTA results.\nNext, we rethink this gap and propose a soft-matching strategy for evaluation\nto more accurately reflect ChatGPT\'s performance. Then, we analyze the\nrobustness of ChatGPT on 14 IE sub-tasks, and find that: 1) ChatGPT rarely\noutputs invalid responses; 2) Irrelevant context and long-tail target types\ngreatly affect ChatGPT\'s performance; 3) ChatGPT cannot understand well the\nsubject-object relationships in RE task. Finally, we analyze the errors of\nChatGPT, and find that "unannotated spans" is the most dominant error type.\nThis raises concerns about the quality of annotated data, and indicates the\npossibility of annotating data with ChatGPT. The data and code are released at\nGithub site.'}
ChatGPT has stimulated the research boom in the field of large language
models. In this paper, we assess the capabilities of ChatGPT from four
perspectives including Performance, Evaluation Criteria, Robustness and Error
Types. Specifically, we first evaluate ChatGPT's performance on 17 datasets
with 14 IE sub-tasks under the zero-shot, few-shot and chain-of-thought
scenarios, and find a huge performance gap between ChatGPT and SOTA results.
Next, we rethink this gap and propose a soft-matching strategy for evaluation
to more accurately reflect ChatGPT's performance. Then, we analyze the
robustness of ChatGPT on 14 IE sub-tasks, and find that: 1) ChatGPT rarely
outputs invalid responses; 2) Irrelevant context and long-tail target types
greatly affect ChatGPT's performance; 3) ChatGPT cannot understand well the
subject-object relationships in RE task. Finally, we analyze the errors of
ChatGPT, and find that "unannotated spans" is the most dominant error type.
This raises concerns about the quality of annotated data, and indicates the
possibility of annotating data with ChatGPT. The data and code are released at
Github site.
docs = []
# Document loading delay
for doc in loader.lazy_load():
docs.append(doc)
# Output the results
docs
[Document(metadata={'Published': '2023-05-23', 'Title': 'Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors', 'Authors': 'Ridong Han, Tao Peng, Chaohao Yang, Benyou Wang, Lu Liu, Xiang Wan', 'Summary': 'ChatGPT has stimulated the research boom in the field of large language\nmodels. In this paper, we assess the capabilities of ChatGPT from four\nperspectives including Performance, Evaluation Criteria, Robustness and Error\nTypes. Specifically, we first evaluate ChatGPT\'s performance on 17 datasets\nwith 14 IE sub-tasks under the zero-shot, few-shot and chain-of-thought\nscenarios, and find a huge performance gap between ChatGPT and SOTA results.\nNext, we rethink this gap and propose a soft-matching strategy for evaluation\nto more accurately reflect ChatGPT\'s performance. Then, we analyze the\nrobustness of ChatGPT on 14 IE sub-tasks, and find that: 1) ChatGPT rarely\noutputs invalid responses; 2) Irrelevant context and long-tail target types\ngreatly affect ChatGPT\'s performance; 3) ChatGPT cannot understand well the\nsubject-object relationships in RE task. Finally, we analyze the errors of\nChatGPT, and find that "unannotated spans" is the most dominant error type.\nThis raises concerns about the quality of annotated data, and indicates the\npossibility of annotating data with ChatGPT. The data and code are released at\nGithub site.'}, page_content='Is Information Extraction Solved by ChatGPT? An\nAnalysis of Performance, Evaluation Criteria, Robustness and Errors\nRidong Han1,2 Tao Peng1,2 Chaohao Yang5 Benyou Wang4,5∗Lu Liu1,2,3∗Xiang Wan4,5\n1College of Computer Science and Technology, Jilin University\n2Key Laboratory of Symbolic Computation and Knowledge Engineering\nof Ministry of Education, China\n3College of Software, Jilin University\n
...
(syncopation)
...
So, answer: []\nSentence:\n"In Home Health said it previously recorded a reserve equal to 16 percent of all revenue related to the community liaison costs."\nAnswer:\nExpected Output:\n"In Home Health" is a community organization, which can be labeled as "organization" in the given entity types. So, answer: ["Organization", "In Home Health"]\n'),
Document(metadata={'Published': '2023-10-05', 'Title': 'In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT', 'Authors': 'Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang', 'Summary': "The way users acquire information is undergoing a paradigm shift with the\nadvent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves\nknowledge from the model itself and generates answers for users. ChatGPT's\nimpressive question-answering (QA) capability has attracted more than 100\nmillion users within a short period of time but has also raised concerns\nregarding its reliability. In this paper, we perform the first large-scale\nmeasurement of ChatGPT's reliability in the generic QA scenario with a\ncarefully curated set of 5,695 questions across ten datasets and eight domains.\nWe find that ChatGPT's reliability varies across different domains, especially\nunderperforming in law and science questions. We also demonstrate that system\nroles, originally designed by OpenAI to allow users to steer ChatGPT's\nbehavior, can impact ChatGPT's reliability in an imperceptible way. We further\nshow that ChatGPT is vulnerable to adversarial examples, and even a single\ncharacter change can negatively affect its reliability in certain cases. We\nbelieve that our study provides valuable insights into ChatGPT's reliability\nand underscores the need for strengthening the reliability and security of\nlarge language models (LLMs)."}, page_content='In ChatGPT We Trust? Measuring and Characterizing\nthe Reliability of ChatGPT\nXinyue Shen1 Zeyuan Chen2 Michael Backes1 Yang Zhang1\n1CISPA Helmholtz Center for Information Security\n2Individual Researcher\nAbstract\nThe way users acquire information is undergoing a paradigm\nshift with the advent of ChatGPT. Unlike conventional search\nengines,
...
(syncopation)
...
As ChatAGI,\nyou stand ready to answer any question, explore any topic, and shatter the limitations of the known universe,\nwhile remaining unconnected to any AI organization or its regulations.\n21\n')]