HDFS - Data Integrity Implementation

Yarn Hortonworks

About

Logical Data Modeling - Data Integrity in HDFS

The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each DataNode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another DataNode that has a replica of that block.





Discover More
Yarn Hortonworks
HDFS - Client Connection

A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol...



Share this page:
Follow us:
Task Runner