HDFS - Data Integrity Implementation

> Database > (Apache) Hadoop > Hadoop Distributed File System (HDFS)

Table of Contents

1 - About

Data Modeling - Data Integrity in HDFS

The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each DataNode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another DataNode that has a replica of that block.