Data Partition - Row or Column Data Store

> (Data|State) Management and Processing > Data Partitions (Clustering of data)

1 - About

Vertica, one of the new column-store databases that compete with Oracle in the high-end data warehousing space. What makes Vertica particularly interesting is that Mike Stonebraker, one of the original figures in the relational database world, was involved in the formation of Vertica.

Oracle recently released Exadata and the HP Oracle Database Machine which competes directly in this space. Vertica, and other products like Sybase IQ and SAND address the problem in a different way in that they store data “column-orientated” What is the ideal solution for DSS-style applications.

Most data warehouses either use a star schema or something close to a star schema. The central “fact” table in such a schema usually has many attributes. 50 attributes is very common and 200 is not unusual. So assume a fact table with 60 attributes.

However, the query from a business analyst invariably only accesses 3 or 4 of these 60 attributes.

A “row store” such as Oracle will read all 60 attributes because the attributes needed are intermixed with all of the others. In contrast, a column store, such as Vertica, will read only the relevant attributes. Note that this will result in a factor of 15-20 less I/O activity.

In addition, a column store has other benefits. Compression is more effective in a column store than a row store, because any storage block only contains one kind of thing. In contrast a storage block in Oracle contains many different kinds of objects. Obviously, it is easier to compress one thing than many things. In head-to-head bakeoffs, Vertica compression is typically a factor of 3 better than Oracle’s. This further lowers I/O activity in a column store.

In customer bakeoffs against Oracle, Vertica has yet to win a benchmark by less than a factor of 30, largely because of these two considerations.

In a columnar data stores, the physical storage components vertically partition the entire model, there is no physical concept of “table” just a logical one where a table structure is made up of columns.


3 - Column-Store Databases

Vendor Product Differentiators
Calpont CNX Data Warehouse Platform Compatible with existing Oracle database environments. Modular architecure for incremental scaling
Exasol EXASolution A leader in 100 GB and 1000 GB TCP-H benchmark tests
InfoBright Brighthouse Data pack storage approach for data-optimized compression for smaller footprint. Queries focus on relevant packs for faster performance.
ParAccel ParAccel Analytic Database Combines column-store architecture with in-memory anaysis capabilities for fastperformance. Offers “drop-in accelerator” configurations on top of popular relational databases.
Sand Technology Sand/DNA Software Extensions available for Oracle, IBM DB2, SAP NetWeaver BI
Sybase Sybase IQ Category leader and early pioneer with 1,000-plus customers
Vertica Vertica Database Supports continuous data loading. Also offers appliance based on HP hardware/Red Hat Linux OS-based database appliance.

4 - Reference