04. LangSmith dataset generation

LangSmith dataset generation

Let's find out how to build your own RAG evaluation dataset.

First, building a dataset requires a large understanding of the trivalent process.

Case: Retrieval Evaluates Relevant on Question

Copy

Question -Retrieval

Copy

Case: Answer Evaluates Relevant for this Question

Copy

Case: Answer answered within Retrievaled document (Hallucination Check)

therefore, Question , Retrieval , Answer It is common to need trivalent information, Retrieval Building Ground Truth for is virtually difficult.

if, Retrieval If Ground Truth for exists, all are stored and utilized as datasets, otherwise Question , Answer You can build and utilize datasets only.

Copy

# 설치
# !pip install -qU langsmith langchain-teddynote

Copy

# API KEY를 환경변수로 관리하기 위한 설정 파일
from dotenv import load_dotenv

# API KEY 정보로드
load_dotenv()

Copy

Copy

Copy

Generate data set

inputs Wow outputs Utilize to generate a data set.

Data set question and answer Consists of.

Copy

Alternatively, you can take advantage of the Synthetic Dataset generated by your previous tutorial.

The code below is an example that utilizes the uploaded HuggingFace Dataset. (Note) by unpacking and running the comments below datasets Please proceed after updating the library.

Copy

Copy

Dataset generation for LangSmith test

  • Datasets & Testing Generate a new dataset on.

You can also generate datasets directly using LangSmith UI in csv files.

Please refer to the documents below for details.

Copy

You can add an example to the dataset later.

Copy

Congratulations! The dataset is now ready.

Last updated