Install Homebrew: Open Terminal and run the following command to install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Apache Spark: Run the following command in Terminal to install Apache Spark using Homebrew:
brew install apache-spark
Configure Environment Variables:
Open Terminal and run the following command:
nano ~/.bash_profile
Add the following lines to the file:
export SPARK_HOME=/usr/local/Cellar/apache-spark//libexec export PYSPARK_PYTHON=/usr/bin/python3 export PATH=$SPARK_HOME/bin:$PATH
Save the file (press Ctrl + X, then Y, and Enter).
Refresh the Environment: Run the following command in Terminal to apply the changes to your current session:
source ~/.bash_profile
Verify the Setup: Open a new Terminal window and run pyspark to launch the PySpark shell. If it starts without errors, the setup is successful.
Testing the PySpark Installation:
- Open a new terminal or command prompt window.
- Run the following command to start the PySpark shell:
pyspark
- If everything is set up correctly, you should see the PySpark shell starting and a Python prompt (
>>>
) appearing. - You can test PySpark by running simple PySpark commands, such as creating RDDs or DataFrames and performing basic operations on them.