Pandas makes it easy to search and filter data with DataFrame and Series. It has tools for changing data. Here are the Pandas SQL queries for Dataframes.
Table of contents
- Reading CSV file and display all the columns
- Get specific columns
- Filter rows on a condition
- Sort the result set
- Get unique values
- Group by rows by a specific column
- Calculate average for each group
- Count number of rows in each group
- Pandas to_sql
- Conclusion
Reading CSV file and display all the columns
#Pandas: Use the DataFrame directly.
import pandas as pd
df = pd.read_csv('your_data.csv') # Load your data into a DataFrame
display(df)
#Equivalent SQL query:
SELECT column1, column2 FROM table_name
Get specific columns
#Pandas: Specify the column names as a list when indexing the DataFrame.
selected_columns = df[['column1', 'column2']]
#Equivalent SQL query:
SELECT * FROM table_name WHERE condition
Filter rows on a condition
#Pandas: Use boolean indexing to filter rows based on a condition.
filtered_df = df[df['column_name'] > 10] # Example condition
#Equivalent SQL query:
SELECT * FROM table_name ORDER BY column_name ASC/DESC
Sort the result set
#Pandas: Use the sort_values method.
sorted_df = df.sort_values(by='column_name', ascending=False)
#Equivalent SQL query:
SELECT DISTINCT column_name FROM table_name
Get unique values
#Pandas: Use the unique method on a Series.
unique_values = df['column_name'].unique()
#Equivalent SQL query:
GROUP BY column_name
Group by rows by a specific column
#Pandas: Use the groupby method.
grouped = df.groupby('column_name')
#Equivalent SQL query:
SELECT AVG(column_name) FROM table_name GROUP BY column_name
Calculate average for each group
#Pandas: Use the groupby and agg functions.
avg_by_group = df.groupby('column_name')['other_column'].agg('mean')
#Equivalent SQL query:
SELECT uid, result, AVG(value) AS v_1
FROM data
GROUP BY uid, result;
Count number of rows in each group
# Pandas: Use the groupby and size functions.
count_by_group = df.groupby('column_name').size()
#Equivalent SQL query:
SELECT COUNT(*) FROM table_name GROUP BY column_name
Pandas to_sql
Here’s an example of using pandas
to write a DataFrame to a SQL database table.
import pandas as pd
from sqlalchemy import create_engine
# Create a sample DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [28, 32, 45],
'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)
# Create a SQLite database engine
engine = create_engine('sqlite:///sample.db')
# Write the DataFrame to a SQL table
df.to_sql('persons', con=engine, if_exists='replace', index=False)
# Query the newly created table
result = pd.read_sql('SELECT * FROM persons', con=engine)
print(result)
Output
Name Age City
0 John 28 New York
1 Alice 32 Paris
2 Bob 45 London
In this example, we first create a DataFrame df
with sample data. Then, we create a SQLite database engine using the create_engine
function and specify the database connection URL. Next, we use the to_sql
method of the DataFrame to write the data to a table named ‘persons’ in the database specified by the engine. The if_exists
parameter is set to ‘replace’ to replace the table if it already exists. Finally, we use pd.read_sql
to query the ‘persons’ table and print the result.
Conclusion
Pandas is a powerful and flexible tool in Python for data manipulation and analysis. It provides numerous functionalities beyond the covered SQL operations.
You must be logged in to post a comment.