Here is an SQL query that extracts employees whose salary is greater than the average salary.
Table of contents
SQL Query
Here’s a way to write in SQL
SELECT
employee_id,
last_name,
first_name,
salary
FROM
employee
WHERE
salary > (SELECT AVG(salary) FROM employee);
Pandas Code
Here’s a way to write in Pandas
import pandas as pd
# Assuming you have a DataFrame named 'employee_df' containing employee data
# Calculate the average salary
average_salary = employee_df['salary'].mean()
# Extract employees with salary greater than the average salary
high_salary_employees = employee_df[employee_df['salary'] > average_salary]
# Display the result
print(high_salary_employees)
PySpark Code
Here’s a way to write in PySpark.
from pyspark.sql import SparkSession
from pyspark.sql.functions import avg
# Initialize SparkSession
spark = SparkSession.builder \
.appName("HighSalaryEmployees") \
.getOrCreate()
# Assuming you have loaded your data into a DataFrame named 'employee_df'
# If not, replace 'employee_df' with your actual DataFrame
# Calculate the average salary
average_salary = employee_df.select(avg("salary")).collect()[0][0]
# Extract employees with salary greater than the average salary
high_salary_employees = employee_df.filter(employee_df["salary"] > average_salary)
# Display the result
high_salary_employees.show()
# Stop SparkSession
spark.stop()
References
- Original post: SQL Query and Code in Pandas & PySpark






