Hive - Table

About

Table implementation in Hive.

Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data.

Articles Related

Property

Qualified Name

The fully qualified name in Hive for a table is:

db_name.table_name

where:

db_name is the database name

Default Storage

By default, tables are assumed to be of:

text input format
and the delimiters are assumed to be ^A(ctrl-a).

Type

The type is based on the location of data.

External: The Data is stored outside the Hive data warehouse.
Internal: The Data is stored in the Hive data warehouse.

External

Hive - External Table

Use external tables when:

The data is also used outside of Hive. For example, the data files are updated by another process (that does not lock the files.)
Data needs to remain in the underlying location, even after dropping the table.
You need a custom location, such as a non-default storage account.
A program other than hive manages the data format, location, etc.

Internal

Hive - Internal table

Use internal tables when:

Data is temporary.
You want Hive to manage the lifecycle of the table and data.

Management

Create

DDL Create

With transform

MovieLensUserRatings

Regex

CREATE TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?"
)
STORED AS TEXTFILE;

ROW FORMAT

row format specifies how the rows are stored in the hive table.

You can use:

STORED AS

Stored as design the file format

Show

List

show tables;
-- The pattern follows Java regular expression syntax (so the period is a wildcard).
SHOW TABLES 'Pattern';
-- Example
SHOW TABLES 'page.*';

Ddl

SHOW CREATE TABLE tableName;

Describe

DESCRIBE FORMATTED tableName
-- To list columns and column types of table.
DESCRIBE EXTENDED page_view;

# col_name              data_type               comment

clientid                string
querytime               string
market                  string
deviceplatform          string
devicemake              string
devicemodel             string
state                   string
country                 string
querydwelltime          double
sessionid               bigint
sessionpagevieworder    bigint

# Detailed Table Information
Database:               default
Owner:                  root
CreateTime:             Thu Jan 25 13:59:12 UTC 2018
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               wasb://hi-clus-hadoop-01-2018-01-25t11-58-44-894z@hiinformaticasawe.blob.core.windows.net/hive/warehouse/hivesampletable
Table Type:             MANAGED_TABLE
Table Parameters:
        numFiles                1
        numRows                 0
        rawDataSize             0
        totalSize               4955715
        transient_lastDdlTime   1516888756

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             \t
        serialization.format    \t
Time taken: 0.449 seconds, Fetched: 41 row(s)

where:

Drop

DROP TABLE IF EXISTS tableName;

Partition

Each Table can have one or more partition.

Load

with beeline

LOAD DATA INPATH '/user/gerardn/myTable.csv' into TABLE myTable;

Table default.myTable stats: [numFiles=1, numRows=0, totalSize=266712, rawDataSize=0]
OK
Time taken: 3.502 seconds

where:

CREATE EXTERNAL TABLE IF NOT EXISTS myTable
(
col1 string, 
col2   int, 
col3       double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
TBLPROPERTIES("skip.header.line.count"="1");

Storage

All the data of a table is stored in a directory in HDFS

location: the base directory location is specified within the create statement.
format: Hive - File Format (Storage format)

A partitioned table will create sub-directory by partition columns. See Partitioned column

Statistics

Table and partition statistics are stored in the Hive Metastore

Number of rows
Number of files
Size in Bytes
Number of partition

See Hive - Table-Level Statistics (Table/Partition/Column)

Constraint

PRIMARY KEY
FOREIGN KEY
UNIQUE
NOT NULL
DEFAULT (to come)

Dual / Dummy

Not yet https://issues.apache.org/jira/browse/HIVE-1558

Example

MovieLensUserRatings Table