How to install PySpark locally

How to install PySpark locally

Here I?ll go through step-by-step to install pyspark on your laptop locally.

Steps:1. Install Python2. Download Spark3. Install pyspark4. Change the execution path for pyspark

Image for postPySpark!!!

Step 1. Install Python

If you haven?t had python installed, I highly suggest to install through Anaconda. For how to install it, please go to their site which provides more details.

Anaconda install instruction:

https://conda.io/docs/user-guide/install/index.html

Step 2. Download Spark

Spark is an open source project under Apache Software Foundation. Spark can be downloaded here:

https://spark.apache.org/downloads.html

First, choose a Spark release. If you don?t have an preference, the latest version is always recommended. Second, choose pre-build for Apache Hadoop. Third, click the download link and download. After installation, recommend to move the file to your home directory and maybe rename it to a shorter name such as spark. Now the spark file should be located here.

/your/home/directory/spark

Step 3. Install pyspark

Now we are going to install pip. Pip is a package management system used to install and manage python packages for you. After you had successfully installed python, go to the link below and install pip.

https://pip.pypa.io/en/stable/installing/

After installing pip, you should be able to install pyspark now. Now run the command below and install pyspark.

$ pip install pyspark

Step 4. Change the execution path for pyspark

Under your home directory, find a file named .bash_profile or .bashrc or .zshrc. This name might be different in different operation system or version. Google it and find your bash shell startup file. Since this is a hidden file, you might also need to be able to visualize hidden files. Again, ask Google! Assume you have success until now, open the bash shell startup file and past the script below.

export SPARK_HOME=”/your/home/directory/spark/python”export PATH=”$SPARK_HOME/bin:$PATH”

Save it and launch your terminal. Run the command below to test.

pyspark

You should be able to see this:

Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ ‘_/ /__ / .__/_,_/_/ /_/_ version 2.2.1 /_/

Congrats! You have successfully installed PySpark on your computer.

8

No Responses

Write a response