Azure - HDInsight (Microsoft's Hadoop)

> Azure

1 - About

Azure HDInsight is a cluster distribution of the Hadoop components from the Hortonworks Data Platform (HDP).

duplicate of Azure - Cluster (HdInsight Cluster)

It regroups open-source frameworks:

  • Hadoop,
  • Spark,
  • Hive,
  • LLAP,
  • Kafka,
  • Storm,
  • R,
  • and more.
Advertising

4 - Template

5 - Post-install script

Install component at the end of the creation with script action

Example: install hue

Add an edge node. See Edge Node

6 - Admin

  • On a headnode (mycluster is the name of the cluster)
# List the root
hdfs dfs -D "fs.default.name=hdfs://mycluster/" -ls /
# Report
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
  • Check the integrity of HDFS on the HDInsight cluster by using the following commands:
hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
Connecting to namenode via http://hn0-ha.ax.internal.cloudapp.net:30070/fsck?ugi=hdsshadm&path=%2F
FSCK started by hdsshadm (auth:SIMPLE) from /10.40.35.148 for path / at Thu Jan 17 16:04:30 UTC 2019
................................Status: HEALTHY
 Total size:    1200 B
 Total dirs:    139
 Total files:   32
 Total symlinks:                0 (Files currently being written: 15)
 Total blocks (validated):      15 (avg. block size 80 B)
 Minimally replicated blocks:   15 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Thu Jan 17 16:04:30 UTC 2019 in 12 milliseconds


The filesystem under path '/' is HEALTHY
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -safemode leave
Advertising

7 - SSH

From SSH Connection

  • To headnode
ssh -i ~/.ssh/myPrivatekey -p 22 sshuser@clustername-ssh.azurehdinsight.net # Primary HeadNode
ssh -i ~/.ssh/myPrivatekey -p 23 sshuser@clustername-ssh.azurehdinsight.net # Secondary HeadNode
  • To edge
ssh sshuser@edgnodename.clustername-ssh.azurehdinsight.net
  • To WorkerNode (from head or edge node). If the SSH account is secured using SSH keys, make sure that ssh forwarding is enabled on the client.
ssh sshuser@wn0-myhdi
azure/hdinsight.txt · Last modified: 2019/05/21 16:32 by gerardnico