07. CSVExcel Data Analysis Agent

Pandas DataFrame You can create an Agent that performs analysis using .

You can create a Pandas DataFrame object from CSV/Excel data, and use it to have the Agent create a Pandas query to perform analysis.

Copy

# Configuration file for managing API keys as environment variables
from dotenv import load_dotenv

# Load API key information
load_dotenv()

Copy

True

Copy

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install -qU langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH15-Agent-Toolkits")

Copy

Start tracking LangSmith.
[프로젝트명]
CH15-Agent-Toolkits

Copy

import pandas as pd

df = pd.read_csv("./data/titanic.csv")  # Read a CSV file.
# df = pd.read_excel("./data/titanic.xlsx") # It can also read Excel files.
df.head()

Copy

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.agents.agent_types import AgentType
from langchain_openai import ChatOpenAI
from langchain_teddynote.messages import AgentStreamParser
import seaborn as sns

sns.set_style("white")

agent = create_pandas_dataframe_agent(
    ChatOpenAI(model="gpt-4o-mini", temperature=0),
    df,
    verbose=False,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    allow_dangerous_code=True,
)

stream_parser = AgentStreamParser()

Copy

def ask(query):
    # Prints answers to queries.
    response = agent.stream({"input": query})

    for step in response:
        stream_parser.process_agent_steps(step)

Copy

ask("How many rows are there?")

Copy

[Call tool]
Tool: python_repl_ast
query: len(df)
Log: 
Invoking: `python_repl_ast` with `{'query': 'len(df)'}`



[Observation]
Observation: 891
[Final answer]
데이터프레임 `df`There are a total of 891 rows.

Copy

ask("What is the difference between the survival rates of men and women?")

Copy

[Call tool]
Tool: python_repl_ast
query: male_survival_rate = df[df['Sex'] == 'male']['Survived'].mean()
female_survival_rate = df[df['Sex'] == 'female']['Survived'].mean()
survival_rate_difference = female_survival_rate - male_survival_rate
survival_rate_difference
Log: 
Invoking: `python_repl_ast` with `{'query': "male_survival_rate = df[df['Sex'] == 'male']['Survived'].mean()\nfemale_survival_rate = df[df['Sex'] == 'female']['Survived'].mean()\nsurvival_rate_difference = female_survival_rate - male_survival_rate\nsurvival_rate_difference"}`



[observation]
Observation: 0.5531300709799203
[Final answer]
The difference in survival rates between men and women is about 0.55, meaning that women's survival rate is about 55.3% higher than men's.

Copy

Copyask("Calculate the survival rates of male and female passengers and visualize them in a barplot chart.")

Copy

[Call tool]
Tool: python_repl_ast
query: import pandas as pd
import matplotlib.pyplot as plt

# Calculating the survival rates of male and female passengers
survival_rate = df.groupby('Sex')['Survived'].mean()

# barplot visualization
survival_rate.plot(kind='bar', color=['blue', 'pink'])
plt.title('Survival Rate by Gender')
plt.xlabel('Gender')
plt.ylabel('Survival Rate')
plt.xticks(rotation=0)
plt.show()
Log: 
Invoking: `python_repl_ast` with `{'query': "import pandas as pd\nimport matplotlib.pyplot as plt\n\n# 남자와 여자 승객의 생존율 계산\nsurvival_rate = df.groupby('Sex')['Survived'].mean()\n\n# barplot 시각화\nsurvival_rate.plot(kind='bar', color=['blue', 'pink'])\nplt.title('Survival Rate by Gender')\nplt.xlabel('Gender')\nplt.ylabel('Survival Rate')\nplt.xticks(rotation=0)\nplt.show()"}`



[observation]
Observation: 
[final answer]
After calculating the survival rates for male and female passengers, you can see the results visualized in a bar chart. The chart compares the survival rates for males and females.

Copy

Copyask("Calculate and visualize the survival rate by gender of children under 10 years old who boarded the 1st and 2nd class")

Copy

[call tool]
Tool: python_repl_ast
query: import pandas as pd
import matplotlib.pyplot as plt

# 1, Filtering children under 10 years old on Class 2
children = df[(df['Pclass'].isin([1, 2])) & (df['Age'] <= 10)]

# Calculating survival rates by gender
survival_rate = children.groupby('Sex')['Survived'].mean() * 100

# Visualization
plt.figure(figsize=(8, 5))
plt.bar(survival_rate.index, survival_rate.values, color=['blue', 'pink'])
plt.title('Survival Rate of Children (Age <= 10) in 1st and 2nd Class by Gender')
plt.xlabel('Gender')
plt.ylabel('Survival Rate (%)')
plt.ylim(0, 100)
plt.grid(axis='y')
plt.show()
Log: 
Invoking: `python_repl_ast` with `{'query': "import pandas as pd\nimport matplotlib.pyplot as plt\n\n# 1, 2등급에 탑승한 10세 이하 어린 아이 필터링\nchildren = df[(df['Pclass'].isin([1, 2])) & (df['Age'] <= 10)]\n\n# 성별별 생존율 계산\nsurvival_rate = children.groupby('Sex')['Survived'].mean() * 100\n\n# 시각화\nplt.figure(figsize=(8, 5))\nplt.bar(survival_rate.index, survival_rate.values, color=['blue', 'pink'])\nplt.title('Survival Rate of Children (Age <= 10) in 1st and 2nd Class by Gender')\nplt.xlabel('Gender')\nplt.ylabel('Survival Rate (%)')\nplt.ylim(0, 100)\nplt.grid(axis='y')\nplt.show()"}`

[observation]
Observation: 
[final answerw]
By running the above code, we calculated and visualized the survival rates by gender for children under 10 years old who boarded Class 1 and 2. The results are displayed in the following bar graph. You can see the survival rates by gender.

Previous06. Agentic RAG Next08. Toolkits Use Agent

Last updated 5 months ago