Saturday 4 June 2016

Spark-1.6.0 installation

Apache Spark is a fast and general-purpose cluster computing system. Spark runs on both Windows and UNIX-like systems. Java installation is one of the mandatory things in installing Spark. Spark provides high-level APIs in Java, Scala, Python and R. Spark runs on Java 7+, Python 2.6+, and R 3.1+. For the Scala API, Spark 1.6.1 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
 Install Python, Java, R, or Scala before Spark installation and verify the versions too. For Scala 2.11.8 installation follow the link. I have installed Java, Python and Scala in my system.

After installing,
Download Spark-1.6.0.tgz from here. Extract the spark tar file by following command:
$ tar xvf spark-1.6.0.tgz
To build Spark and its example programs, run:
$ cd spark-1.6.0
$ build/mvn -DskipTests clean package

To confirm spark installation run one of the sample scala programs in the `examples` directory. Here run a program to compute Pi value
$ ./bin/run-example SparkPi
It gives output:
Pi is roughly 3.142324

If python is already installed in your system, you can run sample python programs in the 'examples' directory to confirm spark installation.
$ ./bin/spark-submit examples/src/main/python/pi.py
It gives output:
Pi is roughly 3.130720

Run Sample Python Program In Spark
Install Spark-1.6.0 by following my previous post.
Here i am going to tell you about how you can run a sample python programs in the 'spark-.6.0/examples/src/main/python/ml' directory
A python program 'tokenizer_example.py' that splits the sentences into word tokens. This can be run by:
$ cd spark-1.6.0
$ ./bin/spark-submit examples/src/main/python/ml/tokenizer_example.py
output:
Row(words=[u'hi', u'i', u'heard', u'about', u'spark'], label=0)
Row(words=[u'i', u'wish', u'java', u'could', u'use', u'case', u'classes'], label=1)
Row(words=[u'logistic,regression,models,are,neat'], label=2)

Thursday 2 June 2016

Scala Installation

Scala is an "object-functional" programming language and it runs on Java platform(Java Virtual Machine). It interoperates with both Java and Javascript. Scala is the implementation language of important frameworks, including Apache Spark, Kafka and Akka.

Install Scala on Ubuntu 15.04
Scala needs Java Runtime 1.6 or later; You can ski installing Java if you already meet this requirement.

Check java version by:
$ java -version
Output :
openjdk version "1.8.0_45-internal"
OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
OpenJDK Server VM (build 25.45-b02, mixed mode)

Download Scala from here.
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.11.8.tgz -C /usr/local/src/scala/

For quick access add scala and scalac to your path:
$ vi .bashrc
Add the following lines at the end:
export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
restart bashrc:
$ . .bashrc

Check scala version:
$ scala -version
Output will be like this:
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL