In both PySpark and Pandas, you can use PIVOT and UNPIVOT to reshape your data for analysis. Here’s how you can implement them.

PIVOT in PySpark
In PySpark, PIVOT is done using the pivot() function, which is available on a DataFrame.
Example (PIVOT in PySpark):
Here we are Pivoting on the subject, dividing it into multiple columns.
from pyspark.sql import SparkSession1
from pyspark.sql import functions as F
# Initialize Spark session
spark = SparkSession.builder.appName("example").getOrCreate()
# Sample DataFrame
data = [
("John", "Math", 90),
("John", "Science", 85),
("Doe", "Math", 70),
("Doe", "Science", 80)
]
columns = ["Name", "Subject", "Score"]
df = spark.createDataFrame(data, columns)
# Pivot the DataFrame
pivot_df = df.groupBy("Name").pivot("Subject").agg(F.first("Score"))
# Show the pivoted DataFrame
pivot_df.show()
Output:
+----+----+-------+
|Name|Math|Science|
+----+----+-------+
|Doe | 70| 80|
|John| 90| 85|
+----+----+-------+
UNPIVOT in PySpark
In PySpark, there is no direct UNPIVOT function. However, you can achieve it by using selectExpr or manually transforming with stack().
Example (UNPIVOT in PySpark using selectExpr):
Here we are Unpivoting the subject, which means merging all subject columns into one.
unpivot_df = pivot_df.selectExpr("Name", "stack(2, 'Math', Math, 'Science', Science) as (Subject, Score)")
unpivot_df.show()
Output:
+----+-------+-----+
|Name|Subject|Score|
+----+-------+-----+
|Doe | Math| 70|
|Doe |Science| 80|
|John| Math| 90|
|John|Science| 85|
+----+-------+-----+
PIVOT in Pandas
In Pandas, pivot() and pivot_table() are used to reshape the DataFrame.
Example (PIVOT in Pandas):
import pandas as pd
# Sample DataFrame
data = {'Name': ['John', 'John', 'Doe', 'Doe'],
'Subject': ['Math', 'Science', 'Math', 'Science'],
'Score': [90, 85, 70, 80]}
df = pd.DataFrame(data)
# Pivot the DataFrame
pivot_df = df.pivot(index='Name', columns='Subject', values='Score')
print(pivot_df)
Output:
Subject Math Science
Name
Doe 70 80
John 90 85
Example (Using pivot_table()):
# Sample DataFrame with duplicate combinations
data = {'Name': ['Alice', 'Alice', 'Bob', 'Bob', 'Alice'],
'Subject': ['Math', 'Math', 'Math', 'Science', 'Science'],
'Score': [85, 90, 95, 80, 100]}
df = pd.DataFrame(data)
# Pivot table with aggregation (mean as the default)
pivot_table_df = df.pivot_table(index='Name', columns='Subject', values='Score')
print(pivot_table_df)
Output:
Subject Math Science
Name
Alice 87.5 100.0
Bob 95.0 80.0
UNPIVOT in Pandas (Melt)
In Pandas melt() is used to Unpivot the data.
Example (UNPIVOT in Pandas):
# Unpivot the DataFrame2 reset_index() converts an index into a regular column and resets the index to default integers
unpivot_df = pivot_df.reset_index().melt(id_vars='Name', var_name='Subject', value_name='Score')
print(unpivot_df)
Output:
Name Subject Score
0 Doe Math 70
1 John Math 90
2 Doe Science 80
3 John Science 85
Summary:
- PySpark:
- Use
pivot()to reshape your data for pivoting. - Use
selectExpr()withstack()for unpivoting.
- Use
- Pandas:
- Use
pivot()orpivot_table()for pivoting. - Use
melt()for unpivoting.
- Use







You must be logged in to post a comment.