148. Using pandas for Data Analysis
The pandas library is one of the most powerful and popular tools for data analysis and manipulation in Python. It provides data structures like DataFrame and Series for handling structured data, such as tables in a database or spreadsheet.
Here are 10 Python snippets demonstrating common data analysis tasks using pandas:
1. Creating a DataFrame
Creating a DataFrame from a dictionary of lists.
Copy
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Miami']
}
df = pd.DataFrame(data)
print(df)Explanation:
A
DataFrameis created from a dictionary, where the keys are the column names and the values are lists of data.
2. Reading Data from a CSV File
Reading a CSV file into a DataFrame.
Copy
Explanation:
pd.read_csv()loads data from a CSV file into aDataFrame.
3. DataFrame Selection and Indexing
Selecting a single column or multiple columns from a DataFrame.
Copy
Explanation:
Use
df['column_name']for selecting a single column anddf[['col1', 'col2']]for selecting multiple columns.
4. Filtering Data
Filtering data based on conditions.
Copy
Explanation:
You can filter a
DataFrameby applying a condition on columns likedf[df['Age'] > 23].
5. Handling Missing Data
Handling missing or NaN values in a DataFrame.
Copy
Explanation:
df.fillna(value)replacesNaNvalues with the specified value.
6. Grouping Data
Grouping data by one or more columns and performing aggregation.
Copy
Explanation:
df.groupby('City')groups the data by the 'City' column and allows performing aggregation functions likemean().
7. Sorting Data
Sorting a DataFrame by one or more columns.
Copy
Explanation:
df.sort_values('column_name')sorts theDataFrameby the specified column. Useascending=Falsefor descending order.
8. Applying Functions to Columns
Applying a custom function to each element of a column.
Copy
Explanation:
df['Age'].apply(func)applies a custom function to each element in the 'Age' column.
9. Merging DataFrames
Merging two DataFrames on a common column.
Copy
Explanation:
pd.merge(df1, df2, on='column_name')merges twoDataFramesbased on a common column. Thehowparameter defines the type of join:inner,outer,left, orright.
10. Pivot Table
Creating a pivot table to summarize data.
Copy
Explanation:
pd.pivot_table(df, values='column_name', index='group_column')creates a pivot table that summarizes the data, allowing for aggregation functions likemean,sum,count, etc.
Conclusion:
pandas provides a comprehensive set of tools to handle and analyze structured data. Whether you're performing basic data manipulation, cleaning, aggregation, or advanced data analysis, pandas simplifies the task, allowing you to focus on the logic of your analysis rather than the implementation details.
Last updated