Deleting a Column in PySpark
In PySpark, removing a column from a DataFrame is quite simple. This tutorial will show you how to do it. We’ll provide clear, step-by-step examples to make the process easy to follow.
| Roll | First Name | Age | Last Name |
|---|---|---|---|
| 1 | Ali | 30 | Khan |
| 2 | Sanjay | 20 | Kumar |
| 3 | Rahul | 67 | kumar |
You can delete a column from a PySpark DataFrame using the drop method. Here's an example:
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("Delete Column Example").getOrCreate()
# Sample DataFrame
data = [("Ali", "Khan", 30),
("Sanjay", "Kumar", 20),
("Rahul", "Kumar", 67)]
columns = ["FirstName", "LastName", "Age"]
df = spark.createDataFrame(data, schema=columns)
# Delete the 'Age' column
df = df.drop("Age")
df.show()
# Delete the 'Age' column
df = df.drop("Age","LastName")
df.show()
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("Delete Column Example").getOrCreate()
# Sample DataFrame
data = [("Ali", "Khan", 30),
("Sanjay", "Kumar", 20),
("Rahul", "Kumar", 67)]
columns = ["FirstName", "LastName", "Age"]
df = spark.createDataFrame(data, schema=columns)
# Delete the 'Age' column
df = df.drop("Age")
#df = df.drop("Age","LastName")
df.show()