Friday, April 29, 2016

Apache Spark installation on windows

Download scalaIDE to run scala applications http://scala-ide.org/

Now we’ll see the installation steps :
  • Install Java 7 or later. Set JAVA_HOME and PATH variable as environment variables.
  • Download Scala 2.10 and install. Set SCALA_HOME  andadd %SCALA_HOME%\bin in   PATH variable in environment variables. To test whether Scala is installed or not, run following command
  • Next thing is Spark. Spark can be installed in two ways.
    •  Building Spark using SBT
    •  Use Prebuilt Spark package

 Building Spark with SBT :
  • Download SBT and install. Set SBT_HOME and PATH variable in environment variables.
  • Download source code from Spark website against any of the Hadoop version.
  • Run sbt assembly command to build the Spark package
  • You need to set Hadoop version also while building as follows : 
  •      sbt –Pyarn –pHadoop 2.3 assembly                                                                     

Using Spark Prebuilt Package:
  • Choose a Spark prebuilt package for Hadoop i.e.  Prebuilt for Hadoop 2.3/2.4 or later. Download and extract it to any drive i.e. D:\spark-1.2.1-bin-hadoop2.3
  • Set SPARK_HOME and add %SPARK_HOME%\bin in PATH in environment variables
  • Run following command on command line.                              
  • You’ll get and error for winutils.exe:
      Though we aren’t using Hadoop with Spark, but somewhere it checks for HADOOP_HOME               variable in configuration. So to overcome this error, download winutils.exe and place it in any             location (i.e. D:\winutils\bin\winutils.exe).
  • Set HADOOP_HOME = D:\winutils in environment variable
  • Now, Re run the command “spark-shell’ , you’ll see the scala shell
  • Here sometimes you can get the \tmp\hive folder have read and write permissions. Because of this issue you were unable to create sqlcontext. To fix this issue:
          From command line --> winutil folder --> bin\winutil.exe chmod 777 /tmp/hive
               if office workstation you will not have permissions to change. But if your installation points to D: folder then you can give the access to the /tmp/hive.
  • For Spark UI : open http://localhost:4040/ in browser
  • For testing the successful setup you can run the example :   
  • It will execute the program and return the result :
Enjoy the installation process :-)



13 comments: