Hadoop - Configuration

> Database > (Apache) Hadoop

1 - About

The Hadoop configuration has two entry point:

Advertising

3 - Configuration

3.1 - HADOOP_CONF_DIR

HADOOP_CONF_DIR is the environment variable that set the directory location.

Default is:

${HADOOP_HOME}/conf
 
# but also
/etc/hadoop/

Example:

C:\hadoop\hadoop-2.7.5\etc\hadoop

From the command line:

hdfs envvars | grep -i HADOOP_CONF_DIR
HADOOP_CONF_DIR='/usr/hdp/2.6.2.25-1/hadoop/conf'

3.2 - File

Hadoop configuration is driven by two types of important configuration files:

  • default files that are read-only default configuration. Example: core-default.xml
  • site file that specific configuration that overwrite the default values of the default files. Example: core-site.xml overrides values in core-default.xml.

They are loaded in order (first default) from the classpath

List example of configuration files:

Site files Default
core-site.xml core-default.xml
hdfs-site.xml hdfs-default.xml
yarn-site.xml yarn-default.xml
mapred-site.xml mapred-default.xml
Advertising

3.3 - Environment variable

3.4 - Class

4 - Management

4.1 - final

Configuration parameters may be declared final. Once a resource declares a value final, no subsequently-loaded resource can alter that value. For example, one might define a final parameter with:

<property>
	<name>dfs.hosts.include</name>
	<value>/etc/hadoop/conf/hosts.include</value>
	<final>true</final>
</property>

Administrators typically define parameters as final in core-site.xml for values that user applications may not alter.

4.2 - Variable Expansion

Value strings are first processed for variable expansion. The available properties are:

Other properties defined in this Configuration; and, if a name is undefined here, Properties in System.getProperties().

For example, if a configuration resource contains the following property definitions:

  <property>
	<name>basedir</name>
	<value>/user/${user.name}</value>
</property>
 
<property>
	<name>tempdir</name>
	<value>${basedir}/tmp</value>
</property>

When conf.get(“tempdir”) is called, then ${basedir} will be resolved to another property in this Configuration, while ${user.name} would then ordinarily be resolved to the value of the System property with that name. By default, warnings will be given to any deprecated configuration parameters and these are suppressible by configuring log4j.logger.org.apache.hadoop.conf.Configuration.deprecation in log4j.properties file.

Advertising

5 - Documentation / Reference

db/hadoop/conf.txt · Last modified: 2018/11/16 13:27 by gerardnico