In PySpark, the withColumnRenamed()
function is used to rename a column in a Dataframe. It allows you to change the name of a column to a new name while keeping the rest of the Dataframe intact.
The syntax of the withColumnRenamed() function:
df.withColumnRenamed(existing, new)
Usage of withColumnRenamed() in PySpark:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() # Create a DataFrame data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)] df = spark.createDataFrame(data, ["Name", "Age"]) # Rename the "Age" column to "AgeGroup" df_new = df.withColumnRenamed("Age", "AgeGroup") df_new.show()
Output:
+-------+--------+ | Name|AgeGroup| +-------+--------+ | Alice| 25| | Bob| 30| |Charlie| 35| +-------+--------+
In the example above, we have a DataFrame with columns “Name” and “Age”. We use the withColumnRenamed() function to rename the “Age” column to “AgeGroup”.
The resulting DataFrame, df_new, retains the original data but with the renamed column.
It’s important to note that withColumnRenamed()
does not modify the original Dataframe; it returns a new Dataframe with the renamed column. You can assign the result to a new variable, as shown in the example, or overwrite the original Dataframe if desired.
If you don’t want to modify existing column you can always add a new column by using withColumn() Function.
The withColumnRenamed()
function is useful when you want to give a more descriptive or meaningful name to a column in your Dataframe. It allows you to update column names without altering the underlying data or structure of the Dataframe.