If else condition in PySpark - Using When Function
In SQL, we often use case when statements to handle conditional logic. PySpark provides a similar functionality using the `when` function to manage multiple conditions.
In this article, we will cover the following:
| ID | First Name | Age | Last Name | Gender |
|---|---|---|---|---|
| 101 | Ali | 29 | Khan | Male |
| 102 | Priya | 35 | Kumari | Female |
| 103 | Chandan | 23 | Kumar | Male |
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, concat_ws, lit, when
spark = SparkSession.builder \
.appName("Case When in PySpark with Example") \
.master("local") \
.getOrCreate()
data = [
(101, "Ali", 29, "khan", "Male"),
(102, "Priya", 35, "kumari", "Female"),
(103, "Chandan", 23, "kumar", "Male")
]
columns = ["ID", "First Name", "Age", "Last Name", "Gender"]
test_df = spark.createDataFrame(data, columns)
transformed_df = test_df.withColumn(
"full_name",
when(
col("Gender") == "Male",
concat_ws(" ", lit("Mr."), col("First Name"), col("Last Name"))
).when(
col("Gender") == "Female",
concat_ws(" ", lit("Ms."), col("First Name"), col("Last Name"))
).otherwise(
concat_ws(" ", lit("Unknown"), col("First Name"), col("Last Name"))
)
)
transformed_df.show()
spark.stop()
As you can see in the output, an additional column has been added based on the conditional logic applied.