Spark - Standalone installation (spark scheme)

> Database > Spark > Spark - Admin

1 - About

Standalone is a simple cluster manager included with Spark that makes it easy to set up a cluster.

Advertising

3 - Connection URL

The connection URL is:

  • spark://hostnameMaster:port to connect to a remote standalone spark.

Example with sparklyr:

sc <- sparklyr::spark_connect(master = "spark://nicoLaptop:7077")

where: master = Spark - Master (Connection URL)

4 - Installation Steps

  • place a compiled version of Spark on each node on the cluster.
  • Start a master server:
./sbin/start-master.sh
# The start master script will call the following command
# ./bin/spark-class.cmd org.apache.spark.deploy.master.Master --host HI-LAPTOP-NGD1 --port 7077 --webui-port 8082
# Then
# C:\Program Files\Java\jdk1.8.0_45\jre\bin\java -cp C:/spark-2.1.1-bin-hadoop2.7/conf\;C:\spark-2.1.1-bin-hadoop2.7\bin\..\jars\* -Xmx1g org.apache.spark.deploy.master.Master --host HI-LAPTOP-NGD1 --port 7077 --webui-port 8080
  • Start a worker node
./sbin/start-slave.sh <master-spark-URL>

5 - Cluster

For the worker and the master

5.1 - UI

The port of the UI by default is:

  • 8080 for the master
  • 8081 for the worker

It can be change through:

  • the start script option –webui-port PORT
  • or the env SPARK_MASTER_WEBUI_PORT and SPARK_WORKER_WEBUI_PORT
Advertising

5.2 - Port

The port of the service by default is:

  • 7077 for master
  • random for worker

It can be change through:

  • the start script option –port PORT
  • or the env SPARK_WORKER_PORT and SPARK_MASTER_PORT

6 - Documentation / Reference