Spark - Application

Card Puncher Data Processing

About

An application is an instance of a driver created via the initialization of a spark context (RDD) or a spark session (Data Set)

This instance can be created via:

Within each Spark application, multiple jobs may be running concurrently if they were submitted by different threads.

Spark Program

A typical script starts with a session (context) and defines:

that are table/like object where you can perform data operation.

See self-contained-applications

Example

See:

Management

Properties

Properties are seen as configuration.

See Application properties such as application name, …

  • In the code
val conf = new SparkConf()
             .setMaster("local[2]")
             .setAppName("CountingSheep")
val sc = new SparkContext(conf)
// empty conf in the code
val sc = new SparkContext(new SparkConf())
./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

Name

The name is used for instance in the name of the log file

Submit

spark-submit

The spark-submit script is used to launch applications:

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

Python and R -(from quickstart):

# For Python examples, use spark-submit directly:
./bin/spark-submit examples/src/main/python/pi.py

# For R examples, use spark-submit directly:
./bin/spark-submit examples/src/main/r/dataframe.R

livy

See livy - http://gethue.com/how-to-use-the-livy-spark-rest-job-server-api-for-submitting-batch-jar-python-and-streaming-spark-jobs/

Kill

  • Spark standalone or Mesos with cluster deploy mode
spark-submit --kill [submission ID] --master [spark://...]
  • Java
./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

Status

  • Spark standalone or Mesos with cluster deploy mode
spark-submit --status [submission ID] --master [spark://...]

Runtime execution configuration

See Spark - Application Execution Configuration





Discover More
Data Mining Tool 2
ML - SparklingWater (h20 inside Spark)

h2oai/sparkling-waterSparkling Water provides H2O's fast scalable machine learning engine inside Spark cluster. Sparkling Water is distributed as a Spark application library which can be used by any Spark...
Card Puncher Data Processing
Spark

Map reduce and streaming framework in memory. See: . The library entry point of which is also a connection object is called a session (known also as context). Component: DAG scheduler, ...
Card Puncher Data Processing
Spark - Application Execution Configuration

in Spark to run an app ie how to calculate: Num-executors - The number of concurrent tasks (executor) that can be executed. Executor-memory - The amount of memory allocated to each executor. Executor-cores...
Spark Cluster
Spark - Cluster

A cluster in Spark has the following component: A spark application composed of a driver program which include the SparkContext (for RDD) or the Spark Session for a data frame which connect to a cluster...
Card Puncher Data Processing
Spark - Configuration

The configuration of Spark is mostly: configuration around an app. runtime-environment The application web UI...
Card Puncher Data Processing
Spark - Connection (Context)

A Spark Connection is : a context object (known also as connection) the first step when creating a script This object is called: an SQL Context for a RDD (in Spark 1.x.) SparkSession for a...
Spark Cluster
Spark - Driver

The driver is a (daemon|service) wrapper created when you get a spark context (connection) that look after the lifecycle of the Spark job. cluster managerapplication manager The driver: start as its...
Rdd 5 Partition 3 Worker
Spark - Executor (formerly Worker)

When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application. Worker or Executor are processes that run computations...
Spark Jobs
Spark - Jobs

Job in Spark. A job is a unit of task for an application. A job consists of tasks that will be executed by the workers in parallel where possible. A job is triggered by an action function.
Card Puncher Data Processing
Spark - Log

or Spark executor logs are located in the /work/app- name of your application Driver logs



Share this page:
Follow us:
Task Runner