Azure - Cluster (HdInsight Cluster)

> Azure

1 - About

Cluster of computer.

!!! duplicate of Azure - HDInsight (Microsoft's Hadoop) !!!

Advertising

3 - Structure

3.1 - Templates

3.2 - Default Storage Account

Each cluster has:

It is referred as the default storage account.

HDInsight cluster and its default storage account must be co-located in the same Azure region.

3.3 - Version

3.4 - Sizing

3.5 - Directory

  • Log: /var/log

3.6 - Type

  • Interactive Query: A Hadoop cluster that provides Low Latency Analytical Processing (LLAP) functionality to improve response times for interactive queries.
  • Hadoop: A Hadoop cluster that is tuned for batch processing workloads. Uses HDFS, YARN resource management,and a simple MapReduce programming model to process and analyze batch data in parallel.
  • Spark: Apache Spark has built-in functionality for working with Hive.
  • HBase: HiveQL can be used to query data stored in HBase.
  • Microsoft R Server - Microsoft R Server (also know as Microsoft Machine Learning Server)

Only the following cluster types support the Enterprise Security Package:

  • Hadoop (HDInsight 3.6 only)
  • Spark
  • Interactive Query
Advertising

4 - Application

By default, the cluster come with:

Advertising

4.1 - Port

5 - Management

5.1 - HostName

Azure HDInsight using an Azure Virtual Network

Azure provides name resolution for Azure services that are installed in a virtual network.

The cluster nodes can communicate directly with each other, and other nodes in HDInsight, by using internal DNS names. Example of internal DNS names assigned to HDInsight worker nodes:

  • wn0-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.cloudapp.net
  • wn2-hdinsi.0owcbllr5hze3hxdja3mqlrhhe.ex.internal.cloudapp.net

5.1.1 - Load balancer

  • clusterName.azurehdinsight.net translates to clusterName.Region.cloudapp.azure.com

5.2 - Create (Provision)

Using Azure Data Factory, you can create HDInsight clusters on demand, and configure a TimeToLive setting to delete the clusters automatically.

Note: Cluster creation (Provisioning)

5.2.1 - Metastore

  • Verify that the database grants access to the Azure service in the firewall of the sql server.
  • Verify that the database grants access to the VNet where the cluster is created

When you create a metastore for Hive or Oozie, don't use dashes, hyphens, or spaces in the database name. This can cause the cluster creation process to fail.

User creation example :

CREATE USER hi_hive WITH PASSWORD = 'the pwd';
CREATE SCHEMA hi_hive AUTHORIZATION hi_hive;
GRANT CONNECT TO hi_hive;
GRANT CREATE TABLE TO hi_hive;
GRANT CREATE VIEW TO hi_hive;
ALTER USER hi_hive WITH DEFAULT_SCHEMA = hi_hive;
 
-- https://social.technet.microsoft.com/wiki/contents/articles/7662.use-sql-azure-database-as-a-hive-metastore.aspx
EXEC sp_addrolemember 'db_ddladmin', 'hi_hive';
EXEC sp_addrolemember 'db_datawriter', 'hi_hive';
EXEC sp_addrolemember 'db_datareader', 'hi_hive';

5.3 - Delete

Data is stored in Azure Storage. A cluster can be safely delete.

Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

In the cli 1.0 from doc

azure hdinsight cluster delete clusterName

5.4 - Template

5.5 - Cost

Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.

5.6 - Version

6 - Support

6.1 - Failed to start Hive Metastore due to metastore schema initialization error

Deployment failed. Correlation ID: 6d6465b6-8727-409a-aaa5-4f754112ee1c. {                                                                                                                                                                                          
  "status": "Failed",                                                                                                                                                                                                                                               
  "error": {                                                                                                                                                                                                                                                        
    "code": "ResourceDeploymentFailure",                                                                                                                                                                                                                            
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",                                                                                                                                                                       
    "details": [                                                                                                                                                                                                                                                    
      {                                                                                                                                                                                                                                                             
        "code": "HiveMetastoreSchemaInitializationFailedErrorCode",                                                                                                                                                                                                 
        "message": "Failed to start Hive Metastore due to metastore schema initialization error. If you are using a custom Hive metastore, please run 'Hive Schema Tool' against your metastore to check for possible issues with metastore configuration."         
      }                                                                                                                                                                                                                                                             
    ]                                                                                                                                                                                                                                                               
  }                                                                                                                                                                                                                                                                 
}                                                                                                                                                                                                                                                                   

Verify that you SQL Azure Server firewall allows inbound connection from the same subnet of your cluster.

azure/cluster.txt · Last modified: 2019/05/23 13:31 by gerardnico