Do you know if it is common to check input value is numeric or not in PySpark? Here’s a way we can see how to use when condition in Pyspark code.

PySpark When Check Numeric of Column [Tested]

PySpark Example Check Numeric

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, lit

# Initialize SparkSession
spark = SparkSession.builder.appName("Example").getOrCreate()

# Example DataFrame
data = [("Srini", 12345), ("Usha", 8956), ("Sailu", "234"), ("Madhu", "567")]
columns = ["name", "valid_id"]
df = spark.createDataFrame(data, columns)

# Filter DataFrame using isin function with list of values inline
filtered_df = df.withColumn("numeric_check", 
                            when(col("valid_id").cast("int").isNull(), lit("Not numeric"))
                            .when(col("valid_id").cast("string").substr(1,1) == "0", lit("First char is zero"))
                            .otherwise(lit("Numeric")))

# Show the filtered DataFrame
filtered_df.show()

Data Analytics with Python – Top Book

Explained

  • Cast the column to the integer using the cast method, and use the isNull() method. If true, it means, it is not numeric, otherwise, it’s numeric
  • The second logic checks if the first char is “0”, it puts a message. Which, I converted to a string, and using the substring function, I extracted the first digit to compare to “0”

Output

df:pyspark.sql.dataframe.DataFrame = [name: string, valid_id: string]
filtered_df:pyspark.sql.dataframe.DataFrame = [name: string, valid_id: string ... 1 more field]
+-----+--------+-------------+
| name|valid_id|numeric_check|
+-----+--------+-------------+
|Srini|   12345|      Numeric|
| Usha|    8956|      Numeric|
|Sailu|     234|      Numeric|
|Madhu|     567|      Numeric|
+-----+--------+-------------+