How to Use withColumnRenamed() Function in PySpark

In PySpark, the withColumnRenamed() function is used to rename a column in a Dataframe. It allows you to change the name of a column to a new name while keeping the rest of the Dataframe intact.

The syntax of the withColumnRenamed() function:

df.withColumnRenamed(existing, new)

Usage of withColumnRenamed() in PySpark:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Create a DataFrame
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])

# Rename the "Age" column to "AgeGroup"
df_new = df.withColumnRenamed("Age", "AgeGroup")
df_new.show()

Output:

+-------+--------+
|   Name|AgeGroup|
+-------+--------+
|  Alice|      25|
|    Bob|      30|
|Charlie|      35|
+-------+--------+

In the example above, we have a DataFrame with columns “Name” and “Age”. We use the withColumnRenamed() function to rename the “Age” column to “AgeGroup”.

The resulting DataFrame, df_new, retains the original data but with the renamed column.

It’s important to note that withColumnRenamed() does not modify the original Dataframe; it returns a new Dataframe with the renamed column. You can assign the result to a new variable, as shown in the example, or overwrite the original Dataframe if desired.

If you don’t want to modify existing column you can always add a new column by using withColumn() Function.

The withColumnRenamed() function is useful when you want to give a more descriptive or meaningful name to a column in your Dataframe. It allows you to update column names without altering the underlying data or structure of the Dataframe.

How to Use withColumnRenamed() Function in PySpark

The syntax of the withColumnRenamed() function:

Usage of withColumnRenamed() in PySpark:

You may find these useful: