Hive - Insert

> Database > Apache - Hive (HS|Hive Server)

1 - About

INSERT can insert into:

3 - Target

3.1 - Hdfs

INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='2008-08-15';

3.2 - Local

Local means the local file system

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;
Advertising

3.3 - Table

INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;

3.4 - Multi-Table

FROM src
  INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100
  INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 AND src.key < 200
  INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE src.key >= 200 AND src.key < 300
  INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out' SELECT src.value WHERE src.key >= 300;

4 - Transform

4.1 - Custom

FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS (oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09';

This streams the data in the map phase through the script /bin/cat (like Hadoop streaming). Similarly – streaming can be used on the reduce side (please see the Hive Tutorial for examples).

weekday_mapper.py
import sys
import datetime
 
for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print '\t'.join([userid, movieid, rating, str(weekday)])
  • Hive
ADD FILE weekday_mapper.py;
 
INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'python weekday_mapper.py'
  AS (userid, movieid, rating, weekday)
FROM u_data;

5 - Documentation / Reference

Advertising
db/hive/insert.txt · Last modified: 2019/05/29 09:55 by gerardnico