Oracle Database - Star Transformation

Card Puncher Data Processing

About

This article talk about the application of the star query in Oracle which involve a star schema.

Star transformation executes the query in two phases:

  1. retrieves the necessary rows from the fact table (row set)
  2. joins this row set to the dimension tables.

The rows from the fact table are retrieved by using bitmap joins between the bitmap indexes on all of the foreign key columns.

Oracle Database Star Transformation

The star transformation is useful for highly selective queries (across multiple dimension filter) on the fact table. To speed up queries with poor selectivity (total orders on all years for instance), aggregates must be used in a ROLAP environment.

Prerequisites

  1. A bitmap index should be built on each of the foreign key columns of the fact table or tables.
  2. The initialization parameter STAR_TRANSFORMATION_ENABLED should be set to TRUE. This enables an important optimizer feature for star-queries. It is set to FALSE by default for backward-compatibility.
  ALTER SESSION SET STAR_TRANSFORMATION_ENABLED=TRUE;
  SHOW PARAMETER STAR_TRANSFORMATION_ENABLED;

You can also use the star transformation hint

/*+ STAR_TRANSFORMATION */

When a data warehouse satisfies these conditions, the majority of the star queries running in the data warehouse will use a query execution strategy known as the star transformation. The star transformation provides very efficient query performance for star queries.

Steps

The star transformation is a powerful optimization technique that relies upon implicitly rewriting (or transforming) the SQL of the original star query. The end user never needs to know any of the details about the star transformation. Oracle's query optimizer automatically chooses the star transformation where appropriate.

The star transformation is a query transformation aimed at executing star queries efficiently. Oracle processes a star query using two basic phases.

  • The first phase: retrieves exactly the necessary rows from the fact table (the result set). Because this retrieval utilizes bitmap indexes, it is very efficient.
  • The second phase joins this result set to the dimension tables.

An example of an end user query is: “What were the sales and profits for the grocery department of stores in the west and southwest sales districts over the last three quarters?” This is a simple star query.

A prerequisite of the star transformation is that there be a single-column bitmap index on every join column of the fact table (foreign key columns).

For example, the sales table of the sh sample schema has bitmap indexes on the time_id, channel_id, cust_id, prod_id, and promo_id columns.

Consider the following star query:

SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc,
   SUM(s.amount_sold) sales_amount
FROM sales s, times t, customers c, channels ch
WHERE s.time_id = t.time_id
AND   s.cust_id = c.cust_id
AND   s.channel_id = ch.channel_id
AND   c.cust_state_province = 'CA'
AND   ch.channel_desc in ('Internet','Catalog')
AND   t.calendar_quarter_desc IN ('1999-Q1','1999-Q2')
GROUP BY ch.channel_class, c.cust_city, t.calendar_quarter_desc;

This query is processed in two phases.

The First phase

In the first phase, Oracle Database uses the bitmap indexes on the foreign key columns of the fact table to identify and retrieve only the necessary rows from the fact table.

That is, Oracle Database will retrieve the result set from the fact table using essentially the following query:

SELECT ... FROM sales
WHERE time_id IN
  (SELECT time_id FROM times 
   WHERE calendar_quarter_desc IN('1999-Q1','1999-Q2'))
   AND cust_id IN
  (SELECT cust_id FROM customers WHERE cust_state_province='CA')
   AND channel_id IN
  (SELECT channel_id FROM channels WHERE channel_desc IN('Internet','Catalog'));

This is the transformation step of the algorithm, because the original star query has been transformed into this sub-query representation. This method of accessing the fact table leverages the strengths of bitmap indexes.

The Strengths of bitmap indexes: Set operations

Intuitively, bitmap indexes provide a set-based processing scheme within a relational database.

Oracle has implemented very fast methods for doing set operations such as:

  • AND (an intersection in standard set-based terminology),
  • OR (a set-based union),
  • MINUS,
  • and COUNT.

The bitmap OR operation

In this star query, a bitmap index on time_id is used to identify the set of all rows in the fact table corresponding to sales in 1999-Q1. This set is represented as a bitmap (a string of 1's and 0's that indicates which rows of the fact table are members of the set).

A similar bitmap is retrieved for the fact table rows corresponding to the sale from 1999-Q2. The bitmap OR operation is used to combine this set of Q1 sales with the set of Q2 sales.

Additional set operations will be done for the customer dimension and the product dimension. At this point in the star query processing, there are three bitmaps:

  • Each bitmap corresponds to a separate dimension table,
  • and each bitmap represents the set of rows of the fact table that satisfy that individual dimension's constraints.

Then the final bitmap AND operation

These three bitmaps are combined into a single bitmap using the bitmap AND operation. This final bitmap represents the set of rows in the fact table that satisfy all of the constraints on the dimension table.

This is the result set, the exact set of rows from the fact table needed to evaluate the query.

Note that none of the actual data in the fact table has been accessed.

All of these operations rely solely on the bitmap indexes and the dimension tables. Because of the bitmap indexes compressed data representations, the bitmap set-based operations are extremely efficient.

Once the result set is identified, the bitmap is used to access the actual data from the sales table. Only those rows that are required for the end user's query are retrieved from the fact table. At this point, Oracle has effectively joined all of the dimension tables to the fact table using bitmap indexes.

This technique provides excellent performance because Oracle is joining all of the dimension tables to the fact table with one logical join operation, rather than joining each dimension table to the fact table independently.

The second phase

The second phase of this query is to join these rows from the fact table (the result set) to the dimension tables. Oracle will use the most efficient method for accessing and joining the dimension tables. Many dimension are very small, and table scans are typically the most efficient access method for these dimension tables.

For large dimension tables, table scans may not be the most efficient access method. In the previous example, a bitmap index on product.department can be used to quickly identify all of those products in the grocery department. Oracle's optimizer automatically determines which access method is most appropriate for a given dimension table, based upon the optimizer's knowledge about the sizes and data distributions of each dimension table.

The specific join method (as well as indexing method) for each dimension table will likewise be intelligently determined by the optimizer.

A hash join is often the most efficient algorithm for joining the dimension tables. The final answer is returned to the user once all of the dimension tables have been joined. The query technique of retrieving only the matching rows from one table and then joining to another table is commonly known as a semi-join.

Execution Plan for a Star Transformation with a Bitmap Index

The following typical execution plan might result from the SQL above:

SELECT STATEMENT
 SORT GROUP BY
  HASH JOIN
   TABLE ACCESS FULL                          CHANNELS
   HASH JOIN
    TABLE ACCESS FULL                         CUSTOMERS
    HASH JOIN
     TABLE ACCESS FULL                        TIMES
     PARTITION RANGE ITERATOR
      TABLE ACCESS BY LOCAL INDEX ROWID       SALES
       BITMAP CONVERSION TO ROWIDS
        BITMAP AND
         BITMAP MERGE
          BITMAP KEY ITERATION
           BUFFER SORT
            TABLE ACCESS FULL                 CUSTOMERS
           BITMAP INDEX RANGE SCAN            SALES_CUST_BIX
         BITMAP MERGE
          BITMAP KEY ITERATION
           BUFFER SORT
            TABLE ACCESS FULL                 CHANNELS
           BITMAP INDEX RANGE SCAN            SALES_CHANNEL_BIX
         BITMAP MERGE
          BITMAP KEY ITERATION
           BUFFER SORT
            TABLE ACCESS FULL                 TIMES
           BITMAP INDEX RANGE SCAN            SALES_TIME_BIX

In this plan, the fact table is accessed through a bitmap access path based on a bitmap AND, of three merged bitmaps. The three bitmaps are generated by the BITMAP MERGE row source being fed bitmaps from row source trees underneath it. Each such row source tree consists of a BITMAP KEY ITERATION row source which fetches values from the subquery row source tree, which in this example is a full table access. For each such value, the BITMAP KEY ITERATION row source retrieves the bitmap from the bitmap index. After the relevant fact table rows have been retrieved using this access path, they are joined with the dimension tables and temporary tables to produce the answer to the query.

Star Transformation with a Bitmap Join Index

In addition to bitmap indexes, you can use a bitmap join index during star transformations. Assume you have the following additional index structure:

CREATE BITMAP INDEX sales_c_state_bjix
ON sales(customers.cust_state_province)
FROM sales, customers
WHERE sales.cust_id = customers.cust_id
LOCAL NOLOGGING COMPUTE STATISTICS;

The processing of the same star query using the bitmap join index is similar to the previous example. The only difference is that Oracle will utilize the join index, instead of a single-table bitmap index, to access the customer data in the first phase of the star query.

How Oracle Chooses to Use Star Transformation

The optimizer generates and saves the best plan it can produce without the transformation. If the transformation is enabled, the optimizer then tries to apply it to the query and, if applicable, generates the best plan using the transformed query.

Based on a comparison of the cost estimates between the best plans for the two versions of the query, the optimizer will then decide whether to use the best plan for the transformed or untransformed version.

If the query requires accessing a large percentage of the rows in the fact table, it might be better to use a full table scan and not use the transformations.

However, if the constraining predicates on the dimension tables are sufficiently selective that only a small portion of the fact table needs to be retrieved, the plan based on the transformation will probably be superior.

Note that the optimizer generates a sub-query for a dimension table only if it decides that it is reasonable to do so based on a number of criteria. There is no guarantee that sub-queries will be generated for all dimension tables. The optimizer may also decide, based on the properties of the tables and the query, that the transformation does not merit being applied to a particular query. In this case the best regular plan will be used.

Star Transformation Restrictions

Star transformation is not supported for tables with any of the following characteristics:

  • Queries with a table hint that is incompatible with a bitmap access path
  • Queries that contain bind variables
  • Tables with too few bitmap indexes. There must be a bitmap index on a fact table column for the optimizer to generate a sub-query for it.
  • Remote fact tables. However, remote dimension tables are allowed in the sub-queries that are generated.
  • Anti-joined tables
  • Tables that are already used as a dimension table in a sub-query
  • Tables that are really unmerged views, which are not view partitions

The star transformation may not be chosen by the optimizer for the following cases:

  • Tables that have a good single-table access path
  • Tables that are too small for the transformation to be worthwhile

In addition, temporary tables will not be used by star transformation under the following conditions:

  • The database is in read-only mode
  • The star query is part of a transaction that is in serializable mode

Documentation / Reference





Discover More
Oltp Dwh
Data Warehousing - Contrasting OLTP and Data Warehousing Environments

One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in Online Transaction Processing (OLTP) environments....
Card Puncher Data Processing
Oracle - Controlling the Behavior of the Query Optimizer with initialization parameters

This section lists some initialization parameters that can be used to control the behaviour of the query optimizer. These parameters can be used to enable various optimizer features in order to improve...
Card Puncher Data Processing
Oracle Database

Documentation about the Oracle database
Card Puncher Data Processing
Oracle Database - Data Warehousing

Parallelism, Partitioning, Result Set Cache, Advanced Compression Star transformation The biggest wait event for large data warehouse sites is: a “direct path read” wait event A shared...
Parallel Linearly Scalable Operation
Oracle Database - Parallel Operations

The fundamental difference and advantage of Oracle's capabilities, however, is that parallel data access and therefore the necessary data redistribution are not constrained by any given hardware architecture...
Data System Architecture
Relational Data Modeling - Denormalization

Denormalization is the process of intentionally backing away from normalization to improve performance by suppressing join and permitting the use of star transformation technique. Denormalization should...



Share this page:
Follow us:
Task Runner