HDFS - File System Metadata

> Database > (Apache) Hadoop > Hadoop Distributed File System (HDFS)

1 - About

The file system metadata section of HDFS.

The NameNode is the repository of all HDFS metadata.

Advertising

3 - Metadata

3.1 - Namenode

The metadata are stored in two files:

  • fsimage file which is the metadata store
  • EditLog transaction log file which records every metadata transaction

The metadata files (FsImage and EditLog) are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. See HDFS - High Availibilty

3.2 - Datanode

The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separate file in its local file system. The DataNode does not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of files per directory and creates subdirectories appropriately. It is not optimal to create all local files in the same directory because the local file system might not be able to efficiently support a huge number of files in a single directory. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files, and sends this report to the NameNode. The report is called the Blockreport.

4 - Extension

Extended attributes (abbreviated as xattrs) are a filesystem feature that allow user applications to associate additional metadata with a file or directory.