Pandas is a Python library used to work with and analyze data. It offers tools like the Series and DataFrame to manage tabular and time-series data efficiently.

ON THIS PAGE
- Install and Creating Dataframe
- Data Acquisition
- Exploring Data with Pandas
- Data Cleaning and Preprocessing
- Data Manipulation with Pandas
- Data Visualization with Pandas
- Time Series Analysis with Pandas
- Advanced Pandas Techniques
- Pandas and Machine Learning Integration
Install and Creating Dataframe
To install Pandas, you can use the following command: pip install pandas. Here’s a basic example of using Pandas to create a DataFrame:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
print(df)
Data Acquisition
Pandas can import data from various sources, including CSV, Excel, SQL databases, JSON, and more. For example, to read a CSV file into a Pandas DataFrame:
import pandas as pd
# Read CSV file into DataFrame
df = pd.read_csv('data.csv')
# Display the DataFrame
print(df.head())
Exploring Data with Pandas
Pandas provides various methods to explore and summarize data, including describe(), head(), tail(), and more. For example, to get summary statistics of a DataFrame:
import pandas as pd
# Assuming df is a DataFrame
summary_stats = df.describe()
print(summary_stats)
Data Cleaning and Preprocessing
Pandas allows you to handle missing data with functions like fillna() and dropna().
To remove duplicate rows from a DataFrame:
import pandas as pd
# Assuming df is a DataFrame
df = df.drop_duplicates()
print(df)
Data Manipulation with Pandas
Pandas enables you to select, filter, and manipulate data using various methods like loc[], iloc[], and more. For example, to filter rows based on a condition:
import pandas as pd
# Assuming df is a DataFrame
filtered_df = df[df['Age'] > 25]
print(filtered_df)
Data Visualization with Pandas
Pandas integrates with Matplotlib to create basic visualizations like line plots, bar charts, and more.
For example, to create a simple line plot:
import pandas as pd
import matplotlib.pyplot as plt
# Assuming df is a DataFrame
df.plot(x='Date', y='Temperature', kind='line')
plt.show()
Time Series Analysis with Pandas
Pandas provides powerful tools to work with time series data, including resampling, time-based indexing, and more. For example, to resample time series data to monthly frequency:
import pandas as pd
# Assuming df is a DataFrame with a DateTime index
monthly_df = df.resample('M').mean()
print(monthly_df)
Advanced Pandas Techniques
Pandas offers advanced techniques like multi-indexing, method chaining, and memory optimization for large datasets. For example, to use method chaining to filter and group data:
import pandas as pd
# Assuming df is a DataFrame
result = df[df['Age'] > 25].groupby('City')['Salary'].mean()
print(result)
Pandas and Machine Learning Integration
Pandas can preprocess data for machine learning tasks, handle feature engineering, and evaluate models. For example, to prepare data for a regression model:
import pandas as pd
from sklearn.linear_model import LinearRegression
# Assuming df is a DataFrame
X = df[['Age', 'Experience']]
y = df['Salary']
model = LinearRegression()
model.fit(X, y)
Related







You must be logged in to post a comment.