How to Install PySpark and Apache Spark on MacOSPosted on 2018-11-12 by Majid Bahrepour
Here is an easy Step by Step guide to installing PySpark and Apache Spark on MacOS.
Step 1: Get Homebrew
Homebrew makes installing applications and languages on a Mac OS a lot easier. You can get Homebrew by following the instructions on its website.
In short you can install Homebrew in the terminal using this command:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Step 2: Installing xcode-select
Xcode is a large suite of software development tools and libraries from Apple. In order to install Java, and Spark through the command line we will probably need to install xcode-select.
Use the blow command in your terminal to install Xcode-select: xcode-select –install
You usually get a prompt that looks something like this to go further with installation:
You need to click “install” to go further with the installation.
Step 3: DO NOT use Homebrew to install Java!
The latest version of Java (at time of writing this article), is Java 10. And Apache spark has not officially supported Java 10! Homebrew will install the latest version of Java and that imposes many issues!
To install Java 8, please go to the official website: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Then From “Java SE Development Kit 8u191” Choose:
Mac OS X x64 245.92 MB jdk-8u191-macosx-x64.dmg
To download Java. Once Java is downloaded please go ahead and install it locally.
Step 3: Use Homebrew to install Apache Spark
To do so, please go to your terminal and type: brew install apache-spark Homebrew will now download and install Apache Spark, it may take some time depending on your internet connection. You can check the version of spark using the below command in your terminal: pyspark –version
You should then see some stuff like below:
Step 4: Install PySpark and FindSpark in Python
To be able to use PyPark locally on your machine you need to install findspark and pyspark
If you use anaconda use the below commands:
#Find Spark Option 1: conda install -c conda-forge findspark #Find Spark Option 2: conda install -c conda-forge/label/gcc7 findspark #PySpark: conda install -c conda-forge pyspark If you use regular python use pip install as: pip install findspark pip install pyspark
Step 5: Your first code in Python
After the installation is completed you can write your first helloworld script:
import findspark from pyspark import SparkContext from pyspark.sql import SparkSession findspark.init() sc = SparkContext(appName="MyFirstApp") spark = SparkSession(sc) print("Hello World!") sc.close() #closing the spark session