Hive - Partition
Table of Contents
1 - About
Each Table can have one or more partition.
The partition keys determine how the data is stored.
Data in each partition may be furthermore divided into Buckets.
Partitioned tables can be created using the PARTITIONED BY clause.
2 - Articles Related
3 - Type
3.1 - Dynamic
See Dynamic Partition Inserts Config:
SET hive.exec.dynamic.partition = TRUE -- Needs to be set to true to enable dynamic partition inserts SET hive.exec.dynamic.partition.mode = nonstrict -- In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic
3.2 - Partitioned Views
4 - Syntax
4.1 - Bucket
Tables or partitions can be bucketed using
CLUSTERED BY columns, and data can be sorted within that bucket via
SORT BY columns.
5 - File System Storage
A separate data directory is created for each distinct value combination in the partition columns.
Example: For a table T with a date partition column, the data will be stored for a particular date in the following directory
6 - Management
6.1 - Describe
SHOW PARTITIONS page_view; -- To list columns and all other properties of table. This prints lot of information and that too not in a pretty format. Usually used for debugging. DESCRIBE EXTENDED page_view PARTITION (ds='2008-08-08');
6.2 - Rebuild
- rebuild the partitions all at once.
MSCK REPAIR TABLE
6.3 - Drop
ALTER TABLE pv_users DROP PARTITION (ds='2008-08-08')
Note that any data for this table or partitions will be dropped and may not be recoverable. *
6.4 - Load
6.4.1 - External table
6.4.2 - Exchange
6.5 - Conf
hive.partition.pruning A strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table. Default
6.6 - Statistic
- Number of rows
- Number of files
- Size in Bytes