Do you know if it is common to check input value is numeric or not in PySpark? Here’s a way we can see how to use when condition in Pyspark code.
![PySpark When Check Numeric of Column [Tested]](https://srinimf.com/wp-content/uploads/2024/07/cur.07.447.444_erg456.jpg?w=195)
PySpark Example Check Numeric
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, lit
# Initialize SparkSession
spark = SparkSession.builder.appName("Example").getOrCreate()
# Example DataFrame
data = [("Srini", 12345), ("Usha", 8956), ("Sailu", "234"), ("Madhu", "567")]
columns = ["name", "valid_id"]
df = spark.createDataFrame(data, columns)
# Filter DataFrame using isin function with list of values inline
filtered_df = df.withColumn("numeric_check",
when(col("valid_id").cast("int").isNull(), lit("Not numeric"))
.when(col("valid_id").cast("string").substr(1,1) == "0", lit("First char is zero"))
.otherwise(lit("Numeric")))
# Show the filtered DataFrame
filtered_df.show()
Data Analytics with Python – Top Book
Explained
- Cast the column to the integer using the cast method, and use the isNull() method. If true, it means, it is not numeric, otherwise, it’s numeric
- The second logic checks if the first char is “0”, it puts a message. Which, I converted to a string, and using the substring function, I extracted the first digit to compare to “0”
Output
df:pyspark.sql.dataframe.DataFrame = [name: string, valid_id: string]
filtered_df:pyspark.sql.dataframe.DataFrame = [name: string, valid_id: string ... 1 more field]
+-----+--------+-------------+
| name|valid_id|numeric_check|
+-----+--------+-------------+
|Srini| 12345| Numeric|
| Usha| 8956| Numeric|
|Sailu| 234| Numeric|
|Madhu| 567| Numeric|
+-----+--------+-------------+
![PySpark When Check Numeric of Column [Tested]](https://srinimf.com/wp-content/uploads/2024/07/image-1.png?w=1024)






You must be logged in to post a comment.