How to Perform Join(Inner, Left, Right, Outer) Operation Part – 1

In PySpark, you can join two DataFrames using different types of joins. Here are the commonly used methods to join DataFrames:

Inner Join:

The inner join returns only the matching rows from both DataFrames based on a common column.

joined_df = df1.join(df2, df1.common_column == df2.common_column, "inner")

In this example, df1 and df2 are joined using the “common_column” and an inner join is performed. The resulting DataFrame joined_df will contain only the rows where the values in the “common_column” match in both DataFrames.

Left Join:

The left join returns all the rows from the left DataFrame and the matching rows from the right DataFrame. If there is no match, it fills the missing values with null.

joined_df = df1.join(df2, df1.common_column == df2.common_column, "left")

Here, a left join is performed between df1 and df2 based on the “common_column”. The resulting DataFrame joined_df will include all the rows from df1 and the matching rows from df2. If there is no match, the corresponding values in df2 will be null.

Right Join:

The right join returns all the rows from the right DataFrame and the matching rows from the left DataFrame. If there is no match, it fills the missing values with null.

joined_df = df1.join(df2, df1.common_column == df2.common_column, "right")

In this example, a right join is performed between df1 and df2 based on the “common_column”. The resulting DataFrame joined_df will include all the rows from df2 and the matching rows from df1. If there is no match, the corresponding values in df1 will be null.

Full Outer Join:

The full outer join returns all the rows from both DataFrames. If there is no match, it fills the missing values with null.

joined_df = df1.join(df2, df1.common_column == df2.common_column, "outer")

Here, a full outer join is performed between df1 and df2 based on the “common_column”. The resulting DataFrame joined_df will include all the rows from both DataFrames. If there is no match, the corresponding values will be null.

These are the commonly used join methods in PySpark. You can choose the appropriate join type based on your data requirements and the desired outcome of the join operation.

How to Perform Join(Inner, Left, Right, Outer) Operation Part – 1

Inner Join:

Left Join:

Right Join:

Full Outer Join:

You may find these useful: