Hive - Dynamic Partition (DP)

> Database > Apache - Hive (HS|Hive Server)

1 - About

DynamicPartitions (DP) is a partition type where the partitioning columns are only known at EXECUTION TIME (ie within a SQL statement)

Hive will create the partitions automatically.

The only difference is that DP columns do not have values, while SP columns do

3 - Conf

3.1 - Enable

SET hive.exec.dynamic.partition = TRUE -- Needs to be set to true to enable dynamic partition inserts

3.2 - Mode

  • set hive.exec.dynamic.partition.mode=strict - the user must specify at least one static partition in case the user accidentally overwrites all partitions
  • set hive.exec.dynamic.partition.mode=nonstrict - in nonstrict mode all partitions are allowed to be dynamic - you don't need any static partition column.
Advertising

4 - Example

4.1 - Full Dynamic

INSERT OVERWRITE TABLE T PARTITION (ds, hr)
SELECT KEY, VALUE, ds, hr FROM srcpart WHERE ds IS NOT NULL AND hr>10;

where:

  • the dynamic partition columns (ie ds, hr) must be specified:
    • last (among the columns in the SELECT statement)
    • in the same order (than in the PARTITION clause)

4.2 - Static and Dynamic

Replace the partition 2010-03-03 (as the partition column order determines directory hierarchy, you cannot set the static value on the second column partition)

INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT KEY, VALUE, /*ds,*/ hr FROM srcpart WHERE ds IS NOT NULL AND hr>10;

4.3 - Create table as (CTAS)

CREATE TABLE T (KEY INT, VALUE string) PARTITIONED BY (ds string, hr INT) AS
SELECT KEY, VALUE, ds, hr+1 hr1 FROM srcpart WHERE ds IS NOT NULL AND hr>10;

5 - Documentation / Reference

Advertising
db/hive/dp.txt · Last modified: 2019/05/16 16:50 by gerardnico