Hive - Data Type

1 - About

3 - System

Hive supports the following data types category:

3.1 - Primitive

Data Type - (Primitive|Native|Built-in) in Hive

Category Type Description
Integers TINYINT 1 byte integer
Integers SMALLINT 2 byte integer
Integers INT 4 byte integer
Integers BIGINT 8 byte integer
Boolean BOOLEAN TRUE/FALSE
Floating point numbers FLOAT single precision
Floating point numbers DOUBLE double precision
Fixed point numbers DECIMAL a fixed point value of user defined scale and precision
String STRING sequence of characters in a specified character set
String VARCHAR sequence of characters in a specified character set with a maximum length
String CHAR sequence of characters in a specified character set with a defined length
Date and time TIMESTAMP a specific point in time, up to nanosecond precision
Date and time DATE a date
Binary BINARY a sequence of bytes
Primitive Type > Number > DOUBLE > FLOAT > BIGINT > INT > SMALLINT > TINYINT
                                 > STRING
               > BOOLEAN

The hierarchy defines how the types are implicitly converted. Implicit conversion is allowed for types from child to an ancestor. Note that the type hierarchy allows the implicit conversion of STRING to DOUBLE.

3.2 - Complex

Data Type - ( Complex | Composite ) in Hive

Complex Types can be built up from primitive types and other composite types using:

  • Structs: the elements within the type can be accessed using the DOT (.) notation. For example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed by the expression c.a
  • Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' → gid the gid value can be accessed using M['group']
  • Arrays (indexable lists): The elements in the array have to be in the same type. Elements can be accessed using the [n] notation where n is an index (zero-based) into the array. For example, for an array A having the elements ['a', 'b', 'c'], A[1] retruns 'b'.

Using the primitive types and the constructs for creating complex types, types with arbitrary levels of nesting can be created. For example, a type User may comprise of the following fields:

  • gender—which is a STRING.
  • active—which is a BOOLEAN.

4 - Management

4.1 - User defined

The typing system is closely tied to the SerDe (Serailization/Deserialization) and object inspector interfaces.

User can create their own types by implementing their own object inspectors, and using these object inspectors they can create their own SerDes to serialize and deserialize their data into HDFS files.

Builtin object inspectors:

  • ListObjectInspector,
  • StructObjectInspector
  • and MapObjectInspector

The dotted notation is used to navigate nested types, for example a.b.c = 1 looks at field c of field b of type a and compares that with 1.

4.2 - Java / Hive mapping

4.3 - Decimal (Precision and Scale)

5 - Documentation / Reference

db/hive/datatype.txt · Last modified: 2018/07/01 09:30 by gerardnico