Google Big Table

Card Puncher Data Processing

Google Big Table

About

BigTable is NoSql database where value may be versioned by time. It's then a time serie database and its open source version is called Hbase.

From Google to support actively update.

  • OSDI paper in 2006 (Some overlap with the authors of the MapReduce paper)
  • Complementary to MapReduce

Data model

BigTable stores data:

  • in tables,
  • which contain rows (identified by a row key)
  • Data in a row is organized into column families, which are groups of columns. A column qualifier identifies a single column within a column family.
  • A cell is at the intersection of a row and a column. A cell contains versioned value.

BigTable is a Column-Oriented DB that stores data in a Multidimensional, sparse, distributed, persistent Sorted Map with the following format:

(row:string, column:string, time:int64) -> String

where:

  • row and column define the value (a cell)
  • and time is a timestamps permitting to store the history of this value.

Timestamps:

  • Each cell can be versioned
  • Each new version increments the timestamp
  • Policies:
    • “keep only latest n versions”
    • “keep only versions since time t”

Rows

  • Data is sorted lexicographically by row key / row number.
  • Row key range broken into tablets (Data are contiguous in a tablet)
  • A tablet is the unit of distribution and load balancing

Column families

  • Column names of the form family:qualifier
  • “family” is the basic unit of:
    • access control
    • memory accounting
    • disk accounting (move around on disc)
  • Typically all columns in a family the same type (for instance to compress)

Tablet management

  • Master assigns tablets to tablet servers
  • Tablet server manages reads and writes from its tablets
  • Clients communicate directly with tablet server
  • Tablet server splits tablets that have grown too large.

Write processing

  • When the memtable size reaches a given threshold, either or both minor and major compaction occur to keep read throughput high
    • Minor Compaction: Write memtable buffer to a SSTable
    • Major Compaction: Rewrites all SSTables into one SSTable and cleans all deletes.

Bigtable Memtable Write

Documentation / Reference





Discover More
Data System Architecture
Column Family Store

s are NoSql store that clusters the data by a set of key columns. The data is then partitioned / distributed across multiple machines according to the key columns. Storage is sparse since only columns...
Database Design Space
Database - (Software|Design Space|Category)

, Greenplum Massively parallel open source data warehouse Originally based on PostgreSQL (See also: ) Drill A single query can...
Columnar Physical Table Representation
Database - Column (Storage|Store)

Storing data in columns is functionally similar to having a built-in index for each column. This data structure is used in analytics and NoSql database. Columnar storage is a popular data structure in...
Card Puncher Data Processing
Database - Hbase

NoSql Column-Oriented DB open source version of
Data System Architecture
Distributed Database - Eventual consistency (Weak)

Eventually consistency means all updates can be expected to propagate to all replicas with a certain period of time. The period should fall in the range of milliseconds with the system being consistent...



Share this page:
Follow us:
Task Runner