Sparklyr - Installation of a dev environment

Card Puncher Data Processing

About

How to install:

  • the package sparklyr
  • and a local spark instance with Livy for a dev environment

Installation

Package

# The repo for instance of microsoft
# options(repos = "https://mran.microsoft.com/snapshot/2017-05-01")

install.packages("sparklyr")

# Last version of Sparklyr
# devtools::install_github("rstudio/sparklyr")
  • Load the library if needed
library(sparklyr)
library(dplyr)

Local Standalone Spark

See Spark - Standalone installation (spark scheme)

  • Do we have already version installed ?
sparklyr::spark_installed_versions()
spark hadoop                       dir
1 1.6.2    2.6 spark-1.6.2-bin-hadoop2.6
2 2.1.1    2.7 spark-2.1.1-bin-hadoop2.7

  • Check the available version (seems that you need to call spark_install first) with the following command:
sparklyr::spark_available_versions()

This is an output of the file “User_Home\Documents\R\win-library\3.3\sparklyr\extdata\install_spark.csv”.

You can tweak it to add version that you will find at https://d3kbcqa49mib13.cloudfront.net/

Sparklyr Available Version

  • Install the one that you want locally
sparklyr::spark_install(version = "1.6.2")
Installing Spark 1.6.2 for Hadoop 2.6 or later.
Downloading from:
- 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
Installing to:
- 'C:\Users\gerardn\AppData\Local\rstudio\spark\Cache/spark-1.6.2-bin-hadoop2.6'
trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
Content type 'application/x-tar' length 278057117 bytes (265.2 MB)
downloaded 265.2 MB

Installation complete.

  • Restart RStudio and verify that you have the HADOOP_HOME
Sys.getenv("HADOOP_HOME")
[1] "C:\\Users\\gerardn\\AppData\\Local\\rstudio\\spark\\Cache\\spark-1.6.2-bin-hadoop2.6\\tmp\\hadoop"

Livy

Spark - Livy (Rest API )

Doc

  • Install
sparklyr::livy_install(version = "0.3.0", spark_home = NULL, spark_version = NULL)
  • Start / Stop
sparklyr::livy_service_start()
sparklyr::livy_service_stop()
  • See also
sparklyr::livy_available_versions()
sparklyr::livy_install_dir()
sparklyr::livy_installed_versions()
sparklyr::livy_home_dir(version = NULL)

Next steps

Sparklyr - Connection (Context)





Discover More
Card Puncher Data Processing
R - Sparklyr

An R interface to spark developped by RStudio. install connect manipulate
Idea Classpath Spark
Spark - Local Installation

A local installation is a spark installation on a single machine (generally a dev machine). The local master connection will start for you a local standalone spark installation on your machine. This...



Share this page:
Follow us:
Task Runner