Spark SQL - Server (Thrift) (STS)

Card Puncher Data Processing

About

The spark SQL server is the HiveServer2 in Hive 1.2.1. It's a Thrift JDBC/ODBC server

Version

  • beeline from Spark or Hive 1.2.1
  • Hive 1.2.1

Configuration

High availaibilty

There is not yet a service discovery (SPARK-19541)

Therefore, a load balancer must be put in front of two thrift server.

Management

Start

Linux

To start the JDBC/ODBC server, run the following in the Spark directory:

./sbin/start-thriftserver.sh

# From Hortonworks
./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=10015

See running-the-thrift-jdbcodbc-server

Windows

cd %SPARK_HOME%\bin
spark-class2 org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal

Connection

Port

The port can be configured with the following conf parameter: –hiveconf hive.server2.thrift.port=10001

The start output gives you also the port (default:10000)

18/07/18 16:36:05 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/07/18 16:36:05 INFO ObjectStore: Initialized ObjectStore
18/07/18 16:36:05 INFO HiveMetaStore: 0: get_databases: default
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=get_databases: default
18/07/18 16:36:05 INFO HiveMetaStore: 0: Shutting down the object store...
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=Shutting down the object store...
18/07/18 16:36:05 INFO HiveMetaStore: 0: Metastore shutdown complete.
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=Metastore shutdown complete.
18/07/18 16:36:05 INFO AbstractService: Service:ThriftBinaryCLIService is started.
18/07/18 16:36:05 INFO AbstractService: Service:HiveServer2 is started.
18/07/18 16:36:05 INFO HiveThriftServer2: HiveThriftServer2 started
18/07/18 16:36:05 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads

Driver UI

http://172.23.0.1:4040/jobs/ (default)

On HdInsight, you need to go to the Yarn UI to get the driver UI:

Headnode

Service for connecting to Spark SQL (Thrift/JDBC) is a Spark Thrift servers on the Head nodes (Example: Azure: Port:10002, Protocol: Thrift)

Azure HdInsight

It's the same than for Hive bust instead of containing httpPath=/hive2 it is httpPath/sparkhive2

  • Gateway: jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2
  • HeadNode: ''jdbc:hive2://headnodehost:10002/;transportMode=http

Example with beeline

beeline -u 'jdbc:hive2://headnodehost:10002/;transportMode=http'

Doc

Beeline

beeline 
!connect jdbc:hive2://localhost:10000 nico ""
SET;
SHOW TABLES;

Documentation / Reference





Discover More
Card Puncher Data Processing
Spark - DataSet

Dataset is a interface to the Spark Engine added in Spark 1.6 that provides: provides the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s...
Sql Hive Arch
Spark - Hive

Hive is the default Spark catalog. Since Spark 2.0, Spark SQL supports builtin Hive features such as: HiveQL Hive SerDes UDFs read...
Card Puncher Data Processing
Spark - SQL Framework

The Spark SQL Framework is a library based around an sql in order to create dataset, data frame with bindings in Python, Scala, Java, and R The Spark SQL Framework can execute SQL queries (Hive as...
Card Puncher Data Processing
Spark - Sql

This section is : the SQL Grammar of Spark and the SQL Thrift Server. Spark SQL SQL is an interface to the spark Spark Sql engine that supports: all existing Hive data formats, the hive syntax...
Card Puncher Data Processing
Spark DataSet - Partition

org/apache/spark/sql/DataFrameWriterpartitionBy(scala.collection.Seq org/apache/spark/sql/DataFrameWriterpartitionBy(String... colNames) org/apache/spark/sql/DatasetforeachPartition(func) - Runs func...



Share this page:
Follow us:
Task Runner