Spark - Shell

Card Puncher Data Processing

About

The shell in different language.

A shell create normally automatically a context (connection, session) given the name sc.

List

Python

pyspark

Example of path: /usr/local/bin/spark-1.3.1-bin-hadoop2.6/bin/

Spark - pyspark

Scala

spark-shell

R

SparkR

./bin/sparkR shell for R

Spark - SparkR API

Sparklyr

Spark inside RStudio for instance.

Jupyter

Jupyter notebook

SQL

Sql Shell

Spark - spark-sql cli

Sparkling Water

See ML - SparklingWater (h20 inside Spark)





Discover More
Data Mining Tool 2
ML - SparklingWater (h20 inside Spark)

h2oai/sparkling-waterSparkling Water provides H2O's fast scalable machine learning engine inside Spark cluster. Sparkling Water is distributed as a Spark application library which can be used by any Spark...
Spark Program
Spark - Application

An application is an instance of a driver created via the initialization of a spark context (RDD) or a spark session (Data Set) This instance can be created via: a whole script (called batch mode)...
Card Puncher Data Processing
Spark - Connection (Context)

A Spark Connection is : a context object (known also as connection) the first step when creating a script This object is called: an SQL Context for a RDD (in Spark 1.x.) SparkSession for a...
Rdd 5 Partition 3 Worker
Spark - Executor (formerly Worker)

When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application. Worker or Executor are processes that run computations...
Idea Classpath Spark
Spark - Local Installation

A local installation is a spark installation on a single machine (generally a dev machine). The local master connection will start for you a local standalone spark installation on your machine. This...
Card Puncher Data Processing
Spark - SparkR API

SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports...
Card Puncher Data Processing
Spark - Yarn

Yarn is a cluster manager supported by Spark. The deployment mode sets where the driver will run. The driver will run: In client mode, in the client process (ie in the current machine), and the...



Share this page:
Follow us:
Task Runner