Hive - Partition

> Database > Apache - Hive (HS|Hive Server)

1 - About

Data Partitions (Clustering of data) in Hive

Each Table can have one or more partition.

The partition keys determine how the data is stored.

Data in each partition may be furthermore divided into Buckets.

Partitioned tables can be created using the PARTITIONED BY clause.


3 - Type

3.1 - Dynamic


See Dynamic Partition Inserts Config:

SET hive.exec.dynamic.partition = TRUE -- Needs to be set to true to enable dynamic partition inserts
SET hive.exec.dynamic.partition.mode = nonstrict -- In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic

3.2 - Partitioned Views

4 - Syntax

4.1 - Bucket

Tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns.

5 - File System Storage

A separate data directory is created for each distinct value combination in the partition columns.

Example: For a table T with a date partition column, the data will be stored for a particular date in the following directory

<table location>/ds=<date>

6 - Management

6.1 - Describe

-- To list columns and all other properties of table. This prints lot of information and that too not in a pretty format. Usually used for debugging.
DESCRIBE EXTENDED page_view PARTITION (ds='2008-08-08');

6.2 - Rebuild

  • rebuild the partitions all at once. MSCK REPAIR TABLE

6.3 - Drop

ALTER TABLE pv_users DROP PARTITION (ds='2008-08-08')

Note that any data for this table or partitions will be dropped and may not be recoverable. *

6.4 - Load

6.4.1 - External table

6.4.2 - Exchange

6.5 - Conf

hive.partition.pruning A strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table. Default nonstrict

6.6 - Statistic


  • Number of rows
  • Number of files
  • Size in Bytes

See Stats


7 - Documentation / Reference

db/hive/partition.txt · Last modified: 2019/02/18 14:53 by gerardnico