To count the number of NULL values in each column of a PySpark DataFrame, you can use the isNull() function. Use this function with the agg method to compute the counts. PySpark’s isNull() method checks for NULL values, and then you can aggregate these checks to count them.

Counting NULL Values in Each Column
Let’s assume you have a DataFrame called df with some NULL values, and you want to count the number of NULL values in each column:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum
# Initialize Spark session
spark = SparkSession.builder.appName("CountNullsInColumns").getOrCreate()
# Example DataFrame with some NULL values
data = [("A", 1, None), ("B", None, 2), (None, 3, 3), ("C", 4, None), ("D", None, None)]
columns = ["col1", "col2", "col3"]
df = spark.createDataFrame(data, columns)
# Count NULLs in each column
null_counts = df.select([sum(col(c).isNull().cast("int")).alias(c) for c in df.columns])
# Show the result
null_counts.show()
Explanation
col(c).isNull(): For each columnc, checks whether the value isNULL.cast("int"): Converts the boolean result.TruerepresentsNULL.Falserepresents notNULL. The result is converted to an integer: 1 forTrueand 0 forFalse.sum(): Aggregates the1s to count the number ofNULLvalues in each column.alias(c): Renames the resulting column to match the original column name.
Output
Running the above code will produce an output similar to this:
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| 2| 3|
+----+----+----+
This output shows the count of NULL values for each column in the DataFrame.
Explanation in Detail
select([ ... ]): Uses list comprehension to generate the expressions for countingNULLvalues in each column.null_counts: The resulting DataFrame contains the counts ofNULLvalues for each column.
Summary
This method is efficient for counting NULL values in each column of a PySpark DataFrame. You can use it for any DataFrame with a large number of columns. This will help you quickly determine where data is missing.







You must be logged in to post a comment.