Spark SQL - Server (Thrift) (STS)

> Database > Spark > Spark - Sql

1 - About

The spark SQL server is the HiveServer2 in Hive 1.2.1. It's a Thrift JDBC/ODBC server

Advertising

3 - Version

  • beeline from Spark or Hive 1.2.1
  • Hive 1.2.1

4 - Configuration

4.1 - High availaibilty

There is not yet a service discovery (SPARK-19541)

Therefore, a load balancer must be put in front of two thrift server.

5 - Management

5.1 - Start

5.1.1 - Linux

To start the JDBC/ODBC server, run the following in the Spark directory:

./sbin/start-thriftserver.sh
 
# From Hortonworks
./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=10015

See running-the-thrift-jdbcodbc-server

5.1.2 - Windows

cd %SPARK_HOME%\bin
spark-class2 org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal

5.2 - Connection

5.2.1 - Port

The port can be configured with the following conf parameter: –hiveconf hive.server2.thrift.port=10001

The start output gives you also the port (default:10000)

18/07/18 16:36:05 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/07/18 16:36:05 INFO ObjectStore: Initialized ObjectStore
18/07/18 16:36:05 INFO HiveMetaStore: 0: get_databases: default
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=get_databases: default
18/07/18 16:36:05 INFO HiveMetaStore: 0: Shutting down the object store...
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=Shutting down the object store...
18/07/18 16:36:05 INFO HiveMetaStore: 0: Metastore shutdown complete.
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=Metastore shutdown complete.
18/07/18 16:36:05 INFO AbstractService: Service:ThriftBinaryCLIService is started.
18/07/18 16:36:05 INFO AbstractService: Service:HiveServer2 is started.
18/07/18 16:36:05 INFO HiveThriftServer2: HiveThriftServer2 started
18/07/18 16:36:05 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
Advertising

5.2.2 - Driver UI

http://172.23.0.1:4040/jobs/ (default)

On HdInsight, you need to go to the Yarn UI to get the driver UI:

5.2.3 - Headnode

Service for connecting to Spark SQL (Thrift/JDBC) is a Spark Thrift servers on the Head nodes (Example: Azure: Port:10002, Protocol: Thrift)

5.2.4 - Azure HdInsight

It's the same than for Hive bust instead of containing httpPath=/hive2 it is httpPath/sparkhive2

  • Gateway: jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2
  • HeadNode: ''jdbc:hive2://headnodehost:10002/;transportMode=http

Example with beeline

beeline -u 'jdbc:hive2://headnodehost:10002/;transportMode=http'

Doc

5.2.5 - Beeline

beeline 
!connect jdbc:hive2://localhost:10000 nico ""
SET;
SHOW TABLES;
Advertising

6 - Documentation / Reference