How to rename a column in Spark Dataframe Scala
Renaming columns is a common operation in data processing. In Apache Spark, you can use the withColumnRenamed function to rename columns in a DataFrame using Scala. This tutorial will guide you through the process of using this function with practical examples and explanations.
| Roll | First Name | Age | Last Name |
|---|---|---|---|
| 1 | Rahul | 30 | Yadav |
| 2 | Sanjay | 20 | gupta |
| 3 | Ranjan | 67 | kumar |
First, you need to import the necessary libraries:
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
For demonstration purposes, let's create a sample DataFrame:
val schema = StructType( Array(
StructField("roll", IntegerType, true),
StructField("first_name", StringType, true),
StructField("age", IntegerType, true),
StructField("last_name", StringType, true)
))
val data = Seq(
Row(1, "rahul", 30, "yadav"),
Row(2, "sanjay", 20, "gupta"),
Row(3, "ranjan", 67, "kumar")
)
val rdd = sparkSession.sparkContext.parallelize(data)
val testDF = sparkSession.createDataFrame(rdd, schema)
val transformedDF=testDF.withColumnRenamed("roll","roll_number")
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
object WithColumnRenamedSpark {
def main(args: Array[String]): Unit = {
val sparkSession = SparkSession
.builder()
.appName("rename a column of spark dataframe scala")
.master("local")
.getOrCreate()
val schema = StructType(Array(
StructField("roll", IntegerType, true),
StructField("first_name", StringType, true),
StructField("age", IntegerType, true),
StructField("last_name", StringType, true)
))
val data = Seq(
Row(1, "rahul", 30, "yadav"),
Row(2, "sanjay", 20, "gupta"),
Row(3, "ranjan", 67, "kumar"),
)
val rdd = sparkSession.sparkContext.parallelize(data)
val testDF = sparkSession.createDataFrame(rdd, schema)
val transformedDF=testDF.withColumnRenamed("roll","roll_number")
transformedDF.show()
sparkSession.stop()
}
}
That's it! You've successfully applied withColumnRenamed to a DataFrame in Spark using Scala.