Spark SQL - Server (Thrift) (STS)

About

The spark SQL server is the HiveServer2 in Hive 1.2.1. It's a Thrift JDBC/ODBC server

Articles Related

Version

beeline from Spark or Hive 1.2.1
Hive 1.2.1

Configuration

High availaibilty

There is not yet a service discovery (SPARK-19541)

Therefore, a load balancer must be put in front of two thrift server.

Management

Start

Linux

To start the JDBC/ODBC server, run the following in the Spark directory:

./sbin/start-thriftserver.sh

# From Hortonworks
./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=10015

See running-the-thrift-jdbcodbc-server

Windows

cd %SPARK_HOME%\bin
spark-class2 org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal

Connection

Port

The port can be configured with the following conf parameter: –hiveconf hive.server2.thrift.port=10001

The start output gives you also the port (default:10000)

18/07/18 16:36:05 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/07/18 16:36:05 INFO ObjectStore: Initialized ObjectStore
18/07/18 16:36:05 INFO HiveMetaStore: 0: get_databases: default
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=get_databases: default
18/07/18 16:36:05 INFO HiveMetaStore: 0: Shutting down the object store...
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=Shutting down the object store...
18/07/18 16:36:05 INFO HiveMetaStore: 0: Metastore shutdown complete.
18/07/18 16:36:05 INFO audit: ugi=gerard        ip=unknown-ip-addr      cmd=Metastore shutdown complete.
18/07/18 16:36:05 INFO AbstractService: Service:ThriftBinaryCLIService is started.
18/07/18 16:36:05 INFO AbstractService: Service:HiveServer2 is started.
18/07/18 16:36:05 INFO HiveThriftServer2: HiveThriftServer2 started
18/07/18 16:36:05 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads

Driver UI

http://172.23.0.1:4040/jobs/ (default)

On HdInsight, you need to go to the Yarn UI to get the driver UI:

https://clusterName.azurehdinsight.net/yarnui/hn/proxy/application_1557231685903_0001/

Headnode

Service for connecting to Spark SQL (Thrift/JDBC) is a Spark Thrift servers on the Head nodes (Example: Azure: Port:10002, Protocol: Thrift)

Azure HdInsight

It's the same than for Hive bust instead of containing httpPath=/hive2 it is httpPath/sparkhive2

Gateway: jdbc:hive2://clustername.azurehdinsight.net:443/;ssl=true;transportMode=http;httpPath=/sparkhive2
HeadNode: ''jdbc:hive2://headnodehost:10002/;transportMode=http

Example with beeline

beeline -u 'jdbc:hive2://headnodehost:10002/;transportMode=http'

Doc

Beeline

beeline 
!connect jdbc:hive2://localhost:10000 nico ""

SET;
SHOW TABLES;

Documentation / Reference

https://hortonworks.com/tutorial/spark-sql-thrift-server-example/