HDFS - Trash

> Database > (Apache) Hadoop > Hadoop Distributed File System (HDFS)

1 - About

If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS.

Instead, HDFS moves it to a trash directory. The file can be restored quickly as long as it remains in trash.

There could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.

3 - Process

Most recent deleted files are moved to the current trash directory (/user/<username>/.Trash/Current), and in a configurable interval, HDFS creates checkpoints (under /user/<username>/.Trash/<date>) for files in current trash directory and deletes old checkpoints when they are expired.

After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed.

Advertising

4 - Management

4.1 - Location

Each user has its own trash directory under /user/<username>/.Trash

4.2 - Trash Checkpoint

See expunge command of FS shell about checkpointing of trash.

5 - Documentation / Reference