12. Compare Experiments(Pairwise Evaluation)

Pairwise Evaluation

Some assessments seek to compare two or more LLM products against each other.

Chatbot Arena This is a comparative evaluation method that can be easily found on the t Arena or LLM leaderboard.

Copy

# installation
# !pip install -qU langsmith langchain-teddynote

Copy

# Configuration file for managing API KEY as environment variable
from dotenv import load_dotenv

# API KEY load information
load_dotenv()

Copy

True

Copy

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install -qU langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH16-Evaluations")

Copy

Now we can create a dataset from these example runs.

Just save your input.

Copy

Perform comparative evaluations.

Copy

Last updated