Spark - Standalone installation (spark scheme)

Card Puncher Data Processing

About

Standalone is a simple cluster manager included with Spark that makes it easy to set up a cluster.

Connection URL

The connection URL is:

  • spark://hostnameMaster:port to connect to a remote standalone spark.

Example with sparklyr:

sc <- sparklyr::spark_connect(master = "spark://nicoLaptop:7077")

where: master = Spark - Master (Connection URL )

Installation Steps

  • place a compiled version of Spark on each node on the cluster.
  • Start a master server:
./sbin/start-master.sh
# The start master script will call the following command
# ./bin/spark-class.cmd org.apache.spark.deploy.master.Master --host HI-LAPTOP-NGD1 --port 7077 --webui-port 8082
# Then
# C:\Program Files\Java\jdk1.8.0_45\jre\bin\java -cp C:/spark-2.1.1-bin-hadoop2.7/conf\;C:\spark-2.1.1-bin-hadoop2.7\bin\..\jars\* -Xmx1g org.apache.spark.deploy.master.Master --host HI-LAPTOP-NGD1 --port 7077 --webui-port 8080
  • Start a worker node
./sbin/start-slave.sh <master-spark-URL>

Cluster

For the worker and the master

UI

The port of the UI by default is:

  • 8080 for the master
  • 8081 for the worker

It can be change through:

  • the start script option –webui-port PORT
  • or the env SPARK_MASTER_WEBUI_PORT and SPARK_WORKER_WEBUI_PORT

Port

The port of the service by default is:

  • 7077 for master
  • random for worker

It can be change through:

  • the start script option –port PORT
  • or the env SPARK_WORKER_PORT and SPARK_MASTER_PORT

Documentation / Reference





Discover More
Card Puncher Data Processing
Python - Installation and configuration

Installation and configuration of a python environment. Download it and install it Example: Linux: Configuration: Path Third library installation: You can also install...
Spark Cluster
Spark - Cluster

A cluster in Spark has the following component: A spark application composed of a driver program which include the SparkContext (for RDD) or the Spark Session for a data frame which connect to a cluster...
Card Puncher Data Processing
Spark - Installation

Spark is agnostic to the underlying cluster manager. The installation is then cluster manager dependent . Mesos See To enable HDFS,...
Idea Classpath Spark
Spark - Local Installation

A local installation is a spark installation on a single machine (generally a dev machine). The local master connection will start for you a local standalone spark installation on your machine. This...
Card Puncher Data Processing
Spark - Log

or Spark executor logs are located in the /work/app- name of your application Driver logs
Card Puncher Data Processing
Spark - Master (Connection URL )

The master defines the master service of a cluster manager where spark will connect. The value of the master property defines the connection URL to this master. local. Start the standalone spark...
Card Puncher Data Processing
Spark - Web UI (Driver UI)

Each driver program has a web UI, typically on port 4040, that displays information : running tasks, executors, and storage usage. The Spark UI will tell you which DataFrames and what percentages...
Sparklyr Available Version
Sparklyr - Installation of a dev environment

How to install: the package sparklyr and a local spark instance with Livy for a dev environment Load the library if needed See Do we have already version installed ? Check the...



Share this page:
Follow us:
Task Runner