In PySpark, the withColumnRenamed() function is used to rename a column in a Dataframe. It allows you to change the name of a column to a new name while keeping the rest of the Dataframe intact.

The syntax of the withColumnRenamed() function:

df.withColumnRenamed(existing, new)

Usage of withColumnRenamed() in PySpark:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Create a DataFrame
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])

# Rename the "Age" column to "AgeGroup"
df_new = df.withColumnRenamed("Age", "AgeGroup")


|   Name|AgeGroup|
|  Alice|      25|
|    Bob|      30|
|Charlie|      35|

In the example above, we have a DataFrame with columns “Name” and “Age”. We use the withColumnRenamed() function to rename the “Age” column to “AgeGroup”.

The resulting DataFrame, df_new, retains the original data but with the renamed column.

It’s important to note that withColumnRenamed() does not modify the original Dataframe; it returns a new Dataframe with the renamed column. You can assign the result to a new variable, as shown in the example, or overwrite the original Dataframe if desired.

If you don’t want to modify existing column you can always add a new column by using withColumn() Function.

The withColumnRenamed() function is useful when you want to give a more descriptive or meaningful name to a column in your Dataframe. It allows you to update column names without altering the underlying data or structure of the Dataframe.