Showing posts with label Apache Spark installation on windows. Show all posts
Showing posts with label Apache Spark installation on windows. Show all posts

Friday, April 29, 2016

Apache Spark installation on windows

Download scalaIDE to run scala applications http://scala-ide.org/

Now we’ll see the installation steps :
  • Install Java 7 or later. Set JAVA_HOME and PATH variable as environment variables.
  • Download Scala 2.10 and install. Set SCALA_HOME  andadd %SCALA_HOME%\bin in   PATH variable in environment variables. To test whether Scala is installed or not, run following command
  • Next thing is Spark. Spark can be installed in two ways.
    •  Building Spark using SBT
    •  Use Prebuilt Spark package

 Building Spark with SBT :
  • Download SBT and install. Set SBT_HOME and PATH variable in environment variables.
  • Download source code from Spark website against any of the Hadoop version.
  • Run sbt assembly command to build the Spark package
  • You need to set Hadoop version also while building as follows : 
  •      sbt –Pyarn –pHadoop 2.3 assembly                                                                     

Using Spark Prebuilt Package:
  • Choose a Spark prebuilt package for Hadoop i.e.  Prebuilt for Hadoop 2.3/2.4 or later. Download and extract it to any drive i.e. D:\spark-1.2.1-bin-hadoop2.3
  • Set SPARK_HOME and add %SPARK_HOME%\bin in PATH in environment variables
  • Run following command on command line.                              
  • You’ll get and error for winutils.exe:
      Though we aren’t using Hadoop with Spark, but somewhere it checks for HADOOP_HOME               variable in configuration. So to overcome this error, download winutils.exe and place it in any             location (i.e. D:\winutils\bin\winutils.exe).
  • Set HADOOP_HOME = D:\winutils in environment variable
  • Now, Re run the command “spark-shell’ , you’ll see the scala shell
  • Here sometimes you can get the \tmp\hive folder have read and write permissions. Because of this issue you were unable to create sqlcontext. To fix this issue:
          From command line --> winutil folder --> bin\winutil.exe chmod 777 /tmp/hive
               if office workstation you will not have permissions to change. But if your installation points to D: folder then you can give the access to the /tmp/hive.
  • For Spark UI : open http://localhost:4040/ in browser
  • For testing the successful setup you can run the example :   
  • It will execute the program and return the result :
Enjoy the installation process :-)