Pandas is a Python library used to work with and analyze data. It offers tools like the Series and DataFrame to manage tabular and time-series data efficiently.

Python Pandas: Key Exercises
Photo by Gary Barnes on Pexels.com

ON THIS PAGE

  1. Install and Creating Dataframe
  2. Data Acquisition
  3. Exploring Data with Pandas
  4. Data Cleaning and Preprocessing
  5. Data Manipulation with Pandas
  6. Data Visualization with Pandas
  7. Time Series Analysis with Pandas
  8. Advanced Pandas Techniques
  9. Pandas and Machine Learning Integration

Install and Creating Dataframe

To install Pandas, you can use the following command: pip install pandas. Here’s a basic example of using Pandas to create a DataFrame:

import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['New York', 'San Francisco', 'Los Angeles']
}

df = pd.DataFrame(data)
print(df)

Data Acquisition

Pandas can import data from various sources, including CSV, Excel, SQL databases, JSON, and more. For example, to read a CSV file into a Pandas DataFrame:

import pandas as pd

# Read CSV file into DataFrame
df = pd.read_csv('data.csv')

# Display the DataFrame
print(df.head())

Exploring Data with Pandas

Pandas provides various methods to explore and summarize data, including describe(), head(), tail(), and more. For example, to get summary statistics of a DataFrame:

import pandas as pd

# Assuming df is a DataFrame
summary_stats = df.describe()
print(summary_stats)

Data Cleaning and Preprocessing

Pandas allows you to handle missing data with functions like fillna() and dropna().
To remove duplicate rows from a DataFrame:

import pandas as pd
# Assuming df is a DataFrame
df = df.drop_duplicates()
print(df)

Data Manipulation with Pandas

Pandas enables you to select, filter, and manipulate data using various methods like loc[], iloc[], and more. For example, to filter rows based on a condition:

import pandas as pd

# Assuming df is a DataFrame
filtered_df = df[df['Age'] > 25]
print(filtered_df)

Data Visualization with Pandas

Pandas integrates with Matplotlib to create basic visualizations like line plots, bar charts, and more.
For example, to create a simple line plot:

import pandas as pd
import matplotlib.pyplot as plt

# Assuming df is a DataFrame
df.plot(x='Date', y='Temperature', kind='line')
plt.show()

Time Series Analysis with Pandas

Pandas provides powerful tools to work with time series data, including resampling, time-based indexing, and more. For example, to resample time series data to monthly frequency:

import pandas as pd

# Assuming df is a DataFrame with a DateTime index
monthly_df = df.resample('M').mean()
print(monthly_df)

Advanced Pandas Techniques

Pandas offers advanced techniques like multi-indexing, method chaining, and memory optimization for large datasets. For example, to use method chaining to filter and group data:

import pandas as pd
# Assuming df is a DataFrame
result = df[df['Age'] > 25].groupby('City')['Salary'].mean()
print(result)

Pandas and Machine Learning Integration

Pandas can preprocess data for machine learning tasks, handle feature engineering, and evaluate models. For example, to prepare data for a regression model:

import pandas as pd
from sklearn.linear_model import LinearRegression
# Assuming df is a DataFrame
X = df[['Age', 'Experience']]
y = df['Salary']
model = LinearRegression()
model.fit(X, y)

Related