Data Partition - Row or Column Data Store
1 - About
Vertica, one of the new column-store databases that compete with Oracle in the high-end data warehousing space. What makes Vertica particularly interesting is that Mike Stonebraker, one of the original figures in the relational database world, was involved in the formation of Vertica.
Oracle recently released Exadata and the HP Oracle Database Machine which competes directly in this space. Vertica, and other products like Sybase IQ and SAND address the problem in a different way in that they store data “column-orientated” What is the ideal solution for DSS-style applications.
Most data warehouses either use a star schema or something close to a star schema. The central “fact” table in such a schema usually has many attributes. 50 attributes is very common and 200 is not unusual. So assume a fact table with 60 attributes.
However, the query from a business analyst invariably only accesses 3 or 4 of these 60 attributes.
A “row store” such as Oracle will read all 60 attributes because the attributes needed are intermixed with all of the others. In contrast, a column store, such as Vertica, will read only the relevant attributes. Note that this will result in a factor of 15-20 less I/O activity.
In addition, a column store has other benefits. Compression is more effective in a column store than a row store, because any storage block only contains one kind of thing. In contrast a storage block in Oracle contains many different kinds of objects. Obviously, it is easier to compress one thing than many things. In head-to-head bakeoffs, Vertica compression is typically a factor of 3 better than Oracle’s. This further lowers I/O activity in a column store.
In customer bakeoffs against Oracle, Vertica has yet to win a benchmark by less than a factor of 30, largely because of these two considerations.
In a columnar data stores, the physical storage components vertically partition the entire model, there is no physical concept of “table” just a logical one where a table structure is made up of columns.
2 - Articles Related
3 - Column-Store Databases
|Calpont||CNX Data Warehouse Platform||Compatible with existing Oracle database environments. Modular architecure for incremental scaling|
|Exasol||EXASolution||A leader in 100 GB and 1000 GB TCP-H benchmark tests|
|InfoBright||Brighthouse||Data pack storage approach for data-optimized compression for smaller footprint. Queries focus on relevant packs for faster performance.|
|ParAccel||ParAccel Analytic Database||Combines column-store architecture with in-memory anaysis capabilities for fastperformance. Offers “drop-in accelerator” configurations on top of popular relational databases.|
|Sand Technology||Sand/DNA Software||Extensions available for Oracle, IBM DB2, SAP NetWeaver BI|
|Sybase||Sybase IQ||Category leader and early pioneer with 1,000-plus customers|
|Vertica||Vertica Database||Supports continuous data loading. Also offers appliance based on HP hardware/Red Hat Linux OS-based database appliance.|
4 - Reference
- Column-Store Databases and DW Appliances: How to Make the Right Choice. With data volumes exploding, conventional enterprise data warehouses are fast running out of headroom. Data warehouse appliances are starting to fill the gap, but the emerging category of column-oriented databases may offer a better option. The key to success is matching your application to the right product.