HDFS - Block

Yarn Hortonworks

About

File System - Block in HDFS.

The block size can be changed by file.

Block are stored on a datanode and are grouped in block pool

Management

Info

Hdfs Ui Block Information

Location

The location on where the blocks are stored is defined in hdfs-site.xml. Example:

<property>
	<name>dfs.datanode.data.dir</name>
	<value>file:/hadoop/data/dfs/datanode</value>
</property>

Offset

For Hive, see Built-in BLOCKOFFSETINSIDE__FILE virtual column

Size

A typical block size used by HDFS is 128 MB. Thus, an HDFS file is chopped up into 128 MB chunks.

<property>
  <name>dfs.blocksize</name>
  <value>134217728</value>
</property>
hdfs getconf -confKey dfs.blocksize
134217728
# of 128 Mb

Move

See the mover hdfs sub-command to move block replicas across storage types.

hdfs mover

Failure

Under-replicated

under-replicated block

  • web ui: The overview page gives you this information.

Under Replicated Block

Missing





Discover More
Card Puncher Data Processing
Consumer Analytics - Event Collector

A collector collects event send by a tracker The event and data send are describe in a measurement protocol Data aggregation refers to techniques for gathering individual data records (for example...
Hdfs Ui Block Pool Id
HDFS - Block Pool

A block pool is a collection of block (on a datanode ???) See the options -deleteBlockPool of dfsadmin. where: Snpashot from the Overview of Snapshot from ...
Yarn Hortonworks
HDFS - Block Replication

in HDFS HDFS stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance. The NameNode makes all decisions regarding replication of blocks. It periodically receives...
Yarn Hortonworks
HDFS - Blockreport

A blockreport is a list of all HDFS data blocks that correspond to each of the local files, and sends this report to the NameNode. Each datanode create and send this report to the namenode: when the...
Yarn Hortonworks
HDFS - File

A typical file in HDFS is gigabytes to terabytes in size. A file is split into one or more blocks. Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any...
Hadoop Hdfs Fsimage
HDFS - FsImage File

The HDFS file system metadata are stored in a file called the FsImage. It contains: the entire file system namespace the mapping of blocks to files and file system properties The FsImage...
Yarn Hortonworks
HDFS - Trash

If trash configuration is enabled, files removed by FS Shell is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory. The file can be restored quickly as long as it remains...
Yarn Hortonworks
HDFS - fsck (File System Check)

Runs the HDFS filesystem checking utility for various inconsistencies. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. It will report...
Yarn Hortonworks
Hadoop - Cluster

cluster in Hadoop. An cluster is group of process (generally one machine per process) called node where you will find two kind of nodes: headnodes that hosts the main process worker nodes that hosts...
Card Puncher Data Processing
Hive - Column

in Hive Context Column Statistics in Hive (HIVE-1362) See ...



Share this page:
Follow us:
Task Runner