Hive - Dynamic Partition (DP)

Card Puncher Data Processing

About

DynamicPartitions (DP) is a partition type where the partitioning columns are only known at EXECUTION TIME (ie within a SQL statement)

Hive will create the partitions automatically.

The only difference is that DP columns do not have values, while SP columns do

Conf

See Dynamic Partition Inserts

Enable

SET hive.exec.dynamic.partition = true -- Needs to be set to true to enable dynamic partition inserts

Mode

  • set hive.exec.dynamic.partition.mode=strict - the user must specify at least one static partition in case the user accidentally overwrites all partitions
  • set hive.exec.dynamic.partition.mode=nonstrict - in nonstrict mode all partitions are allowed to be dynamic - you don't need any static partition column.

Example

Full Dynamic

INSERT OVERWRITE TABLE T PARTITION (ds, hr)
SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10;

where:

  • the dynamic partition columns (ie ds, hr) must be specified:
    • last (among the columns in the SELECT statement)
    • in the same order (than in the PARTITION clause)

Static and Dynamic

Replace the partition 2010-03-03 (as the partition column order determines directory hierarchy, you cannot set the static value on the second column partition)

INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT key, value, /*ds,*/ hr FROM srcpart WHERE ds is not null and hr>10;

Create table as (CTAS)

CREATE TABLE T (key int, value string) PARTITIONED BY (ds string, hr int) AS
SELECT key, value, ds, hr+1 hr1 FROM srcpart WHERE ds is not null and hr>10;

Documentation / Reference





Discover More
Card Puncher Data Processing
Hive - Partition

in Hive Each Table can have one or more partition. Data in each partition may be furthermore divided into Buckets. The partition columns determine how the data is stored. A separate data directory...



Share this page:
Follow us:
Task Runner