Spark - Sql

About

This section is about:

Spark SQL may refers to the more global and general framework. See Spark - SQL Framework

SQL is an interface to the spark Spark Sql engine that supports:

  • all existing Hive data formats,
  • the hive syntax (user-defined functions - UDF),
  • and the Hive metastore to get the metadata of the data stored in HDFS.

Spark SQL is the standard for SQL on Spark. Hive on Spark is similar to SparkSQL but aims to leverage existing investments in Hive (security, …) on Spark. They are two different components with two different code base.

Features:

  • Runs SQL / HiveQL queries, optionally alongside or replacing existing Hive deployments
  • Connect existing BI tools to Spark through JDBC

Documentation / Reference

Task Runner