Google Big Table (Hbase)

> Database

1 - About

From Google to support actively update.

  • OSDI paper in 2006 (Some overlap with the authors of the MapReduce paper)
  • Complementary to MapReduce
Advertising

3 - Data model

BigTable is a Column-Oriented DB that stores data in a Multidimensional, sparse, distributed, persistent Sorted Map with the following format:

(row:string, column:string, time:int64) -> String

where:

  • row and column define the value
  • and time is a timestamps permitting to store the history of this value.

Timestamps:

  • Each cell can be versioned
  • Each new version increments the timestamp
  • Policies:
    • “keep only latest n versions”
    • “keep only versions since time t”

4 - Rows

  • Data is sorted lexicographically by row key / row number.
  • Row key range broken into tablets (Data are contiguous in a tablet)
  • A tablet is the unit of distribution and load balancing
Advertising

5 - Column families

  • Column names of the form family:qualifier
  • “family” is the basic unit of:
    • access control
    • memory accounting
    • disk accounting (move around on disc)
  • Typically all columns in a family the same type (for instance to compress)

6 - Tablet management

  • Master assigns tablets to tablet servers
  • Tablet server manages reads and writes from its tablets
  • Clients communicate directly with tablet server
  • Tablet server splits tablets that have grown too large.

7 - Write processing

  • When the memtable size reaches a given threshold, either or both minor and major compaction occur to keep read throughput high
    • Minor Compaction: Write memtable buffer to a SSTable
    • Major Compaction: Rewrites all SSTables into one SSTable and cleans all deletes.

Advertising

8 - Documentation / Reference