Download Apache Spark: Go to the official Apache Spark website (https://spark.apache.org/downloads.html) and download the latest version of Spark.
Extract Spark: Once downloaded, extract the Spark package to a desired location on your Windows machine.

Configure Environment Variables:

Open the Start menu and search for “Environment Variables.”
Click on “Edit the system environment variables” to open the System Properties window.
Click on “Environment Variables” at the bottom of the window.
Under “System variables,” click “New” and add the following variables:
Variable Name: SPARK_HOME, Variable Value: <path to your Spark installation directory>
Variable Name: HADOOP_HOME, Variable Value: <path to your Hadoop installation directory> (if using Hadoop)
Variable Name: PYSPARK_PYTHON, Variable Value: <path to your Python executable>
Add Spark’s bin directory to PATH: Append the following to the “Path” variable under “System variables”:
%SPARK_HOME%\bin

Verify the Setup: Open a new command prompt and run PySpark to launch the PySpark shell. If it starts without errors, the setup is successful.

Testing the PySpark Installation:

  • Open a new terminal or command prompt window.
  • Run the following command to start the PySpark shell:
pyspark
>>>
  • If everything is set up correctly, you should see the PySpark shell starting and a Python prompt (>>>) appearing.
  • You can test PySpark by running simple PySpark commands, such as creating RDDs or DataFrames and performing basic operations on them.