10 Pandas SQL Queries to Use Frequently for Transformation

Pandas makes it easy to search and filter data with DataFrame and Series. It has tools for changing data. Here are the Pandas SQL queries for Dataframes.

Table of contents

Reading CSV file and display all the columns

#Pandas: Use the DataFrame directly.
import pandas as pd
df = pd.read_csv('your_data.csv') # Load your data into a DataFrame
display(df)

#Equivalent SQL query:
SELECT column1, column2 FROM table_name

Get specific columns

#Pandas: Specify the column names as a list when indexing the DataFrame.
selected_columns = df[['column1', 'column2']]

#Equivalent SQL query:
SELECT * FROM table_name WHERE condition

Filter rows on a condition

#Pandas: Use boolean indexing to filter rows based on a condition.
filtered_df = df[df['column_name'] > 10] # Example condition

#Equivalent SQL query:
SELECT * FROM table_name ORDER BY column_name ASC/DESC

Sort the result set

#Pandas: Use the sort_values method.
sorted_df = df.sort_values(by='column_name', ascending=False)

#Equivalent SQL query:
SELECT DISTINCT column_name FROM table_name

Get unique values

#Pandas: Use the unique method on a Series.
unique_values = df['column_name'].unique()

#Equivalent SQL query:
GROUP BY column_name

Group by rows by a specific column

#Pandas: Use the groupby method.
grouped = df.groupby('column_name')

#Equivalent SQL query:
SELECT AVG(column_name) FROM table_name GROUP BY column_name

Calculate average for each group

#Pandas: Use the groupby and agg functions.
avg_by_group = df.groupby('column_name')['other_column'].agg('mean')

#Equivalent SQL query:
SELECT uid, result, AVG(value) AS v_1 
FROM data
GROUP BY uid, result;

Count number of rows in each group

# Pandas: Use the groupby and size functions.
count_by_group = df.groupby('column_name').size()

#Equivalent SQL query:
SELECT COUNT(*) FROM table_name GROUP BY column_name

Pandas to_sql

Here’s an example of using pandas to write a DataFrame to a SQL database table.

import pandas as pd
from sqlalchemy import create_engine

# Create a sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [28, 32, 45],
        'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)

# Create a SQLite database engine
engine = create_engine('sqlite:///sample.db')

# Write the DataFrame to a SQL table
df.to_sql('persons', con=engine, if_exists='replace', index=False)

# Query the newly created table
result = pd.read_sql('SELECT * FROM persons', con=engine)
print(result)

Output

     Name  Age      City
0   John   28  New York
1  Alice   32     Paris
2    Bob   45    London

In this example, we first create a DataFrame df with sample data. Then, we create a SQLite database engine using the create_engine function and specify the database connection URL. Next, we use the to_sql method of the DataFrame to write the data to a table named ‘persons’ in the database specified by the engine. The if_exists parameter is set to ‘replace’ to replace the table if it already exists. Finally, we use pd.read_sql to query the ‘persons’ table and print the result.

Conclusion

Pandas is a powerful and flexible tool in Python for data manipulation and analysis. It provides numerous functionalities beyond the covered SQL operations.

Srinimf

Tech. Jobs. Biz. Success

10 Pandas SQL Queries to Use Frequently for Transformation

Reading CSV file and display all the columns

Get specific columns

Filter rows on a condition

Sort the result set

Get unique values

Group by rows by a specific column

Calculate average for each group

Count number of rows in each group

Pandas to_sql

Conclusion

Author: Srini

Reading CSV file and display all the columns

Get specific columns

Filter rows on a condition

Sort the result set

Get unique values

Group by rows by a specific column

Calculate average for each group

Count number of rows in each group

Pandas to_sql

Conclusion

Share this:

Related

Author: Srini