HDFS - File

> Database > (Apache) Hadoop > Hadoop Distributed File System (HDFS)

1 - About

A typical file in HDFS is gigabytes to terabytes in size.

A file is split into one or more blocks.

Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.


3 - Management

3.1 - Load/Put

Put a local file in the Hadoop file system with FS Shell

hadoop fs -put /data/bacon.txt /user/demo/food/bacon.txt

3.2 - Replication

HDFS stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance.

See HDFS - Block Replication

3.3 - Delete

If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS.

3.4 - Name

See INPUT__FILE__NAME (input file's name for a mapper task) built-in Hive column