Here’s a guide to verify the PySpark installation by running a simple script that counts the number of lines in a text file:

Prepare a Text File:

Create a text file with some content. For example, you can create a file named example.txt and add a few lines of text.

Write the PySpark Script:

Open a text editor and create a new file. Save it with a .py extension, such as line_count.py.
In the file, write the following PySpark script:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
    .appName("Line Count") \
    .getOrCreate()

# Read the text file into an RDD
lines_rdd = spark.sparkContext.textFile("path/to/example.txt")

# Count the number of lines in the RDD
line_count = lines_rdd.count()

# Print the result
print("Number of lines:", line_count)

Replace “path/to/example.txt” with the actual path to the text file you created in step 1.

Run the PySpark Script:

Open a terminal or command prompt.
Navigate to the directory where you saved the line_count.py file.
Run the following command to execute the PySpark script:

spark-submit line_count.py

If everything is set up correctly, you should see the output displaying the number of lines in the text file.