Saturday 4 June 2016

Spark-1.6.0 installation

Apache Spark is a fast and general-purpose cluster computing system. Spark runs on both Windows and UNIX-like systems. Java installation is one of the mandatory things in installing Spark. Spark provides high-level APIs in Java, Scala, Python and R. Spark runs on Java 7+, Python 2.6+, and R 3.1+. For the Scala API, Spark 1.6.1 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
 Install Python, Java, R, or Scala before Spark installation and verify the versions too. For Scala 2.11.8 installation follow the link. I have installed Java, Python and Scala in my system.

After installing,
Download Spark-1.6.0.tgz from here. Extract the spark tar file by following command:
$ tar xvf spark-1.6.0.tgz
To build Spark and its example programs, run:
$ cd spark-1.6.0
$ build/mvn -DskipTests clean package

To confirm spark installation run one of the sample scala programs in the `examples` directory. Here run a program to compute Pi value
$ ./bin/run-example SparkPi
It gives output:
Pi is roughly 3.142324

If python is already installed in your system, you can run sample python programs in the 'examples' directory to confirm spark installation.
$ ./bin/spark-submit examples/src/main/python/pi.py
It gives output:
Pi is roughly 3.130720

Run Sample Python Program In Spark
Install Spark-1.6.0 by following my previous post.
Here i am going to tell you about how you can run a sample python programs in the 'spark-.6.0/examples/src/main/python/ml' directory
A python program 'tokenizer_example.py' that splits the sentences into word tokens. This can be run by:
$ cd spark-1.6.0
$ ./bin/spark-submit examples/src/main/python/ml/tokenizer_example.py
output:
Row(words=[u'hi', u'i', u'heard', u'about', u'spark'], label=0)
Row(words=[u'i', u'wish', u'java', u'could', u'use', u'case', u'classes'], label=1)
Row(words=[u'logistic,regression,models,are,neat'], label=2)

No comments:

Post a Comment