We can create functions in PySpark to make our code easier. There are three steps to create a UDF in PySpark.
- Creating a function
- Register the function as udf
- Apply the function to each column

Creating UDF in PySpark
Convert the first letter of first_name and last_name to uppercase in the PySpark code example.
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
# Create a Spark session
spark = SparkSession.builder.appName("Capitalize Names UDF").getOrCreate()
# Sample data
data = [(1, "john jones"), (2, "tracey smith"), (3, "amy sanders")]
columns = ["Seqno", "Name"]
# Create a DataFrame
df = spark.createDataFrame(data, columns)
# Define a Python function to capitalize the first letter of each word
def capitalize_name(name):
return " ".join([word.capitalize() for word in name.split()])
# Register the function as a UDF
capitalize_name_udf = udf(capitalize_name, StringType())
# Apply the UDF to the Name column
df_transformed = df.withColumn("Name", capitalize_name_udf("Name"))
# Show the result
df_transformed.show()
Output
Before Converting
+-----+------------+
|Seqno|Name |
+-----+------------+
|1 |john jones |
|2 |tracey smith|
|3 |amy sanders |
+-----+------------+
After Converting
+-----+-------------+
|Seqno|Name |
+-----+-------------+
|1 |John Jones |
|2 |Tracey Smith |
|3 |Amy Sanders |
+-----+-------------+







You must be logged in to post a comment.