HDFS - Checkpoint

> Database > (Apache) Hadoop > Hadoop Distributed File System (HDFS)

1 - About

During a checkpoint the changes from the transaction log (Editlog) are applied to the metadata store (FsImage) (because it's not efficient to record each change on the metadata store (FsImage)

3 - Checkpoint process

When the NameNode starts up, or a checkpoint is triggered by a configurable threshold,:

  • it reads the FsImage and EditLog from disk
  • it applies all the transactions from the EditLog to the in-memory representation of the FsImage
  • it flushes out this new version into a new FsImage on disk.
  • It truncates the old EditLog because its transactions have been applied to the persistent FsImage.
Advertising

4 - Management

4.1 - Trigger / Run

A checkpoint can be triggered:

  • at a given time interval (dfs.namenode.checkpoint.period) expressed in seconds,
  • or after a given number of filesystem transactions have accumulated (dfs.namenode.checkpoint.txns).

If both of these properties are set, the first threshold to be reached triggers a checkpoint.

From the config file:

hdfs-site.xml
<property>
  <name>dfs.namenode.checkpoint.period</name>
  <value>21600</value>
</property>
 
<property>
  <name>dfs.namenode.checkpoint.txns</name>
  <value>1000000</value>
</property>

or command line:

hdfs getconf -confKey dfs.namenode.checkpoint.period

4.2 - Location

From the config file:

hdfs-site.xml
<property>
  <name>dfs.namenode.checkpoint.dir</name>
  <value>/hadoop/hdfs/namesecondary</value>
</property>
 
<property>
  <name>dfs.namenode.checkpoint.edits.dir</name>
  <value>${dfs.namenode.checkpoint.dir}</value>
</property>

or command line:

hdfs getconf -confKey dfs.namenode.checkpoint.dir
/hadoop/hdfs/namesecondary
Advertising