If-Else Condition in PySpark DataFrame

If else condition in PySpark - Using When Function

In SQL, we often use case when statements to handle conditional logic. PySpark provides a similar functionality using the `when` function to manage multiple conditions.

In this article, we will cover the following:

when
when otherwise
when with multiple conditions

For example, consider the sample data below:

Sample Data

ID	First Name	Age	Last Name	Gender
101	Ali	29	Khan	Male
102	Priya	35	Kumari	Female
103	Chandan	23	Kumar	Male

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, concat_ws, lit, when

spark = SparkSession.builder \
    .appName("Case When in PySpark with Example") \
    .master("local") \
    .getOrCreate()

data = [
    (101, "Ali", 29, "khan", "Male"),
    (102, "Priya", 35, "kumari", "Female"),
    (103, "Chandan", 23, "kumar", "Male")
]

columns = ["ID", "First Name", "Age", "Last Name", "Gender"]
test_df = spark.createDataFrame(data, columns)

transformed_df = test_df.withColumn(
    "full_name",
    when(
col("Gender") == "Male",
concat_ws(" ", lit("Mr."), col("First Name"), col("Last Name"))
    ).when(
col("Gender") == "Female",
concat_ws(" ", lit("Ms."), col("First Name"), col("Last Name"))
    ).otherwise(
concat_ws(" ", lit("Unknown"), col("First Name"), col("Last Name"))
    )
)

transformed_df.show()
spark.stop()

As you can see in the output, an additional column has been added based on the conditional logic applied.