collect() Action:

The collect() action returns all the elements of the RDD as an array to the driver program.

# Creating an RDD
rdd = spark.sparkContext.parallelize([1, 2, 3, 4, 5])

# Applying collect action to retrieve all elements
collected_list = rdd.collect()

The collect() action gathers all the elements from the RDD rdd and returns them as a list to the driver program.

count() Action:

The count() action returns the total number of elements in the RDD.

# Creating an RDD
rdd = spark.sparkContext.parallelize([1, 2, 3, 4, 5])

# Applying count action to get the number of elements
element_count = rdd.count()

The count() action calculates and returns the total count of elements in the RDD rdd.

take() Action:

The take() action returns the specified number of elements from the RDD.

# Creating an RDD
rdd = spark.sparkContext.parallelize([1, 2, 3, 4, 5])

# Applying take action to get the first two elements
first_two_elements = rdd.take(2)

The take() action retrieves the first two elements from the RDD rdd and returns them as a list.

reduce() Action :

The reduce() action in PySpark is used to aggregate the elements of an RDD using a specified binary function. It repeatedly applies the function to pairs of elements in the RDD until a single result is obtained.

Here’s an example of using reduce() in PySpark:

# Creating an RDD
rdd = spark.sparkContext.parallelize([1, 2, 3, 4, 5])

# Applying reduce action to calculate the sum of all elements
sum_of_elements = rdd.reduce(lambda x, y: x + y)

In the above example, the reduce() action is applied to the RDD rdd using a lambda function lambda x, y: x + y. This lambda function takes two elements and returns their sum. The reduce() action repeatedly applies this function to pairs of elements in the RDD until a single result is obtained.

The result of the reduce() action in this example is the sum of all elements in the RDD rdd. The final value of sum_of_elements will be 15, which is the sum of 1 + 2 + 3 + 4 + 5.

It’s important to note that the reduce() action requires the binary function to be both associative and commutative. Associativity ensures that the result is the same regardless of the order in which the elements are reduced. Commutativity ensures that the result is the same regardless of the order in which the partitions of the RDD are processed.

The reduce() action is useful for performing aggregations, such as calculating sums, products, maximum or minimum values, or any other computation that can be expressed as an associative and commutative operation on the elements of the RDD.