Hive - Load data

> Database > Apache - Hive (HS|Hive Server)

1 - About

How to load data into Hive Table.

2 - Options

This is are the following possibilities:

  • File System operation
  • SQL Operations
  • Third party tools
Advertising

2.1 - File System

Replace the file on HDFS when the input data format is the same as the table format with:

  • load statement without the INPUTFORMAT for loading local file (local to HDFS) - See Loadingfilesintotables
  • an HDFS client such as
    • WebHDFS: A Web URL provides an Upload end-point into a designated HDFS folder.
    • NFS mount

2.2 - Third party tools

2.3 - SQL operations

INSERT can insert into a Hive table, local directory or HDFS directory

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] 
  • create as select

See also external table partition_load

Advertising

2.3.1 - Multi table insert

FROM S
INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT KEY, VALUE, ds, hr FROM srcpart WHERE ds IS NOT NULL AND hr>10
INSERT OVERWRITE TABLE R PARTITION (ds='2010-03-03, hr=12)
SELECT key, value, ds, hr from srcpart where ds is not null and hr = 12;

2.4 - export/import between cluster

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport - The EXPORT command exports the data of a table or partition, along with the metadata, into a specified output location. This output location can then be moved over to a different Hadoop or Hive instance and imported from there with the IMPORT command.

3 - Example

local = not in HDFS

LOAD DATA LOCAL inpath '" + filepath + "' INTO TABLE " + tableName
db/hive/load.txt · Last modified: 2019/05/29 09:24 by gerardnico