Hive - Partition

1 - About

Data Partitions (Clustering of data) in Hive

Each Table can have one or more partition.

The partition keys determine how the data is stored.

Data in each partition may be furthermore divided into Buckets.

Partitioned tables can be created using the PARTITIONED BY clause.

3 - Type

3.1 - Dynamic

3.2 - Partitioned Views

4 - Syntax

4.1 - Bucket

Tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns.

5 - File System Storage

A separate data directory is created for each distinct value combination in the partition columns.

Example: For a table T with a date partition column, the data will be stored for a particular date in the following directory

<table location>/ds=<date>

6 - Management

6.1 - Describe

SHOW PARTITIONS page_view;
-- To list columns and all other properties of table. This prints lot of information and that too not in a pretty format. Usually used for debugging.
DESCRIBE EXTENDED page_view PARTITION (ds='2008-08-08');

6.2 - Rebuild

  • rebuild the partitions all at once. MSCK REPAIR TABLE

6.3 - Drop

ALTER TABLE pv_users DROP PARTITION (ds='2008-08-08')

Note that any data for this table or partitions will be dropped and may not be recoverable. *

6.4 - Load

6.4.1 - External table

6.4.2 - Exchange

6.5 - Conf

hive.partition.pruning A strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table. Default nonstrict

6.6 - Statistic

List:

  • Number of rows
  • Number of files
  • Size in Bytes

See Stats

7 - Documentation / Reference

db/hive/partition.txt ยท Last modified: 2018/07/01 10:28 by gerardnico