It is very important to evaluate the performance of RAG (Search Enhancement Generation) augmented pipline.
However, manually generating hundreds of QA (question-context-response) samples in a document can be time consuming and labor consuming. In addition, man-made questions can be difficult to reach the level of complexity required for a thorough evaluation, ultimately affecting the quality of the evaluation.
Using synthetic data generation, in the data aggregation process Developer time 90% Can be reduced to
After uncommenting below, run it and proceed after installing the package.
Copy
# !pip install -qU ragas
Copy
# Configuration file for managing API KEY as environment variable
from dotenv import load_dotenv
# Load API KEY information
load_dotenv()
Copy
True
Copy
# Set up LangSmith tracking. https://smith.langchain.com
# !pip install -qU langchain-teddynote
from langchain_teddynote import logging
# Enter a project name.
logging.langsmith("CH16-Evaluations")
Copy
Documents utilized for practice
Software Policy Institute (SPRi)-December 2023
Author: Jaeheung Lee (AI Policy Institute Office Liability Institute), Lee Ji-soo (AI Policy Lab Yi Phyang Institute)
Link: https://spri.kr/posts/view/23669
File name: SPRI_AI_Brief_2023년12월호_F.pdf
Files downloaded for practice data Please copy to folder
Document pretreatment
Load documents.
Copy
Copy
Each document object metadata Contains a metadata dictionary that can be used to store additional information about documents accessible through.
Metadata dictionary filename Make sure the key is included.
This key will be utilized in the Test datasets creation process. Metadata filename Properties are used to identify chunks belonging to the same document.
Copy
Generate data set
Copy
Initialize DocumentStore. Use custom LLM and embedding.
Copy
Generates TestSet.
Copy
Distribution by type of question
simple: simple question
reasoning: questions that need reasoning
multi_context: questions to consider multiple contexts
conditional: conditional question
Copy
documents: document data
test_size: the number of questions to create
distributions: distribution by question type
with_debugging_logs: Debugging log output
Copy
Copy
Saves the dataset stored in DataFrame as a csv file
Copy
Copy
Copy
Copy
Copy
Execute evaluation only for specific tags
Tag settings (enter the desired tag)
Edit Rule
You can only evaluate for a specific Tag by setting Tag without Evaluation of all steps.
Tag creation
Save after Grade check
Be sure to check Preview, then turn off Preview mode again to move on to the next step.
caution
Preview ensures that data is entered in the correct place
from langchain_community.document_loaders import PDFPlumberLoader
# Create a document loader
loader = PDFPlumberLoader("data/SPRI_AI_Brief_2023년12월호_F.pdf")
# Loading documents
docs = loader.load()
# Table of contents, excluding last page
docs = docs[3:-1]
# Number of pages in the document
len(docs)
19
# Set metadata (filename must exist)
for doc in docs:
doc.metadata["filename"] = doc.metadata["source"]
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context, conditional
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset.extractor import KeyphraseExtractor
from ragas.testset.docstore import InMemoryDocumentStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Dataset Generator
generator_llm = ChatOpenAI(model="gpt-4o-mini")
# Dataset Criticizer
critic_llm = ChatOpenAI(model="gpt-4o-mini")
# Document Embedding
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Sets the text splitter.
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# Wrap LangChain's ChatOpenAI model with LangchainLLMWrapper to make it compatible with Ragas.
langchain_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
# Initialize the key phrase extractor. It uses the LLM defined above.
keyphrase_extractor = KeyphraseExtractor(llm=langchain_llm)
# Generate agas_embeddings
ragas_embeddings = LangchainEmbeddingsWrapper(embeddings)
# Initializes an InMemoryDocumentStore.
# This is a repository that stores and manages documents in memory.
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=ragas_embeddings,
extractor=keyphrase_extractor,
)
# Determine distribution by question type
# simple: Simple question, reasoning: Questions that require inference, multi_context: undefined, conditional: 조건부 질문
distributions = {simple: 0.4, reasoning: 0.2, multi_context: 0.2, conditional: 0.2}
# Create a test set
# docs: Document data, 10: Number of questions to generate, distributions: 질문 유형별 분포, with_debugging_logs: 디버깅 로그 출력 여부
testset = generator.generate_with_langchain_docs(
documents=docs, test_size=10, distributions=distributions, with_debugging_logs=True
)
# Convert the generated test set to a pandas DataFrame
test_df = testset.to_pandas()
test_df
# Output the top 5 rows of a DataFrame
test_df.head()
# Save DataFrame as CSV file
test_df.to_csv("data/ragas_synthetic_dataset.csv", index=False)
# All evaluation requests
_ = evaluation_runnable.invoke(
"What is the name of the generative AI developed by Samsung Electronics?", config=all_eval_config
)
# Request Context Recall Evaluation
_ = evaluation_runnable.invoke(
"What is the name of the generative AI developed by Samsung Electronics?",
config=context_recall_config,
)
# Request a Hallucination Evaluation
_ = evaluation_runnable.invoke(
"What is the name of the generative AI developed by Samsung Electronics?", config=hallucination_config
)
from langchain_core.runnables import RunnableConfig
# Set tags.
hallucination_config = RunnableConfig(tags=["hallucination_eval"])
context_recall_config = RunnableConfig(tags=["context_recall_eval"])
all_eval_config = RunnableConfig(tags=["hallucination_eval", "context_recall_eval"])