OBIEE - Fragmentation Content, value-based, horizontal partitioning

1 - About

This capabilities of OBIEE is also known as :

  • Content Fragmentation
  • Value-Based fragmentation
  • Fact Horizontal Partitioning

The term Fragmentation is more an OBIEE term whereas the term partitioning refer more to the database side.

Depending of a value of column, you can split the query against a table or an other. Very often, the table are fragmented according to the most important dimension in a star schema the time dimension.

For example, you will have two tables :

  • a table for the current year
  • a table for the history

All request for the current year will be then performed against the current year table and the database will not need to retrieve all the data (The history data are exlcuded).

The goal is simple, it's :

  • to reduce the I/O (disk) effort
  • to increase the selectivity

to increase at the end the performance.

The BI Server decide to use one logical table source rather than a union of relevant fragment source when you have include in the filter a predicate on the fragmentation column. The filter is evaluated against the fragmentation content definitions to decide which of the sources will be use for the query.

3 - Prerequisites

3.1 - The same set of columns must be mapped for each fragment

If a logical table is sourced from a set of fragments, it is not required that every individual fragment maps the same set of columns.

However, the server returns different answers depending on how columns are mapped.

  • If all the fragments of a logical table map the same set of columns, than the set of fragmented sources is considered to be the whole universe of logical table sources for the logical table. This means that measure aggregations can be calculated based on the set of fragments.
  • If the set of mapped columns differ across the fragments, than the Oracle BI Server assumes that it does not have the whole universe of fragments, and therefore it would be incorrect to calculate aggregate rollups (since some fragments are missing). In this case, the server returns NULL as measure aggregates.

3.2 - The whole domain must be defined

A domain can have many sources. The sources have to all follow the rule that each level must contain sources that, when combined, comprise the whole domain of values at that level.

Setting up the entire domain for each level helps ensure that queries:

  • asking for Coke, Pepsi, and 7-Up do not leave out 7-Up.
  • requesting information that has been precomputed and stored in aggregate tables can retrieve that information from the aggregate tables, even if the query requests other information that is not stored in the aggregate tables.

3.3 - The fragment content expression must be consistent with the level

The fragment content expression must be consistent with the level

For instance, the below fragment is bad because it has a level on Month but fragment by day. The result is that OBIEE will not find him even if you had only a month attribute because the day attribute is not present.

4 - The predicate

General rules, Overlap is permissible when there are parallel tracks (ie the use of an OR operator). The rule is that at least one of the tracks has to be nonoverlapping. The other tracks can have overlap.

4.1 - Single column

OBI Server expects in the fragmentation content as definition only:

  • values equality
  • or ranges of values

It doesn't expect an <>. The predicate below will not work.

logicalColumn <> value1

4.1.1 - Value-Based

  • IN
logicalColumn IN <valueList1>
  • equality and or
logicalColumn = <value1> OR logicalColumn = <value2>

4.1.2 - Range-Based

For each fragment, the upper value must be expressed as <. You will get an error if you use ⇐. Likewise, you cannot use the BETWEEN predicate to describe fragment range content.

Fragment 1:

logicalColumn >= valueof(START_VALUE) AND logicalColumn < valueof(MID_VALUE1)

Fragment 2:

logicalColumn < valueof(START_VALUE) OR logicalColumn >= valueof(MID_VALUE1)

If you want to use a repository variable to define the predicate of each fragment, you have to set one repository variable by expression.

logicalColumn >= valueof(MyFirstRepositoryVariable)+1 AND logicalColumn < valueof(MySecondRepositoryVariable)

4.2 - Multicolumn

An arbitrary number of predicates on different columns can be included in each content filter. Each column predicate can be value-based or range-based.

<logicalColumn1 predicate> AND <logicalColumn2 predicate > ... AND <logicalColumnM predicate>

4.2.1 - Time-based

The time-based content fragementation is based on the parallel content OR technique. It supports the fact that a constraint on year must be sufficient to select a historical table define with a date snapshot. This example assumes that the snapshot month was April 1, 12:00 a.m. in the year 1999.

Fragment 1 (Historical):

EnterpriseModel.Period."Day" < VALUEOF("Snapshot Date") OR
EnterpriseModel.Period.MonthCode < VALUEOF("Snapshot Year Month") OR
EnterpriseModel.Period."Year" < VALUEOF("Snapshot Year") OR
EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND
  EnterpriseModel.Period."Month in Year" < VALUEOF("Snapshot Month") OR
EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND
  EnterpriseModel.Period."Monthname" IN ('Mar', 'Feb', 'Jan')

Fragment 2 (Current):

EnterpriseModel.Period."Day" >= VALUEOF("Snapshot Date") OR
EnterpriseModel.Period.MonthCode >= VALUEOF("Snapshot Year Month") OR
EnterpriseModel.Period."Year" > VALUEOF("Snapshot Year") OR
EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND
  EnterpriseModel.Period."Month in Year" >= VALUEOF("Snapshot Month") OR
EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND
  EnterpriseModel.Period."Monthname" IN ('Dec', 'Nov', 'Oct', 'Sep', 'Aug', 'Jul',
  'Jun', '', 'Apr')

4.2.2 - Status and time-based

In an order entry application, time-based fragmentation between historical and current fragments is typically insufficient. For example, records might still be volatile, even though they are historical records entered into the database before the snapshot date. Oracle talks of an unbalanced parallel Content.

Assume, in the following example, that open orders can be directly updated by the application until the order is shipped or canceled. After the order has shipped, however, the only change that can be made to the order is to type a separate compensating return order transaction.

Fragment 1 (Historical):

Marketing."Order Status"."Order Status" IN ('Shipped', 'Canceled') AND
  Marketing.Calendar."Calendar Date" <= VALUEOF("Snapshot Date") OR
Marketing.Calendar."Year" <= VALUEOF("Snapshot Year") OR
Marketing.Calendar."Year Month" <= VALUEOF("Snapshot Year Month")

Fragment 2 (Current):

Marketing."Order Status"."Order Status" IN ('Shipped', 'Canceled') AND
  Marketing.Calendar."Calendar Date" > VALUEOF("Snapshot Date") OR
Marketing.Calendar."Year" >= VALUEOF("Snapshot Year") OR
Marketing.Calendar."Year Month" >= VALUEOF("Snapshot Year Month") OR
Marketing."Order Status"."Order Status" = 'Open'

The overlapping Year and Month descriptions in the two fragments do not cause a problem, as overlap is permissible when there are parallel tracks (ie the use of an OR operator). The rule is that at least one of the tracks has to be nonoverlapping. The other tracks can have overlap.

5 - Example

To do: This example doesn't use a time-based content fragmentation

This example is made with the sh sample schema.

We will implement a simple value-based fragmentation between a history and an actual table.

In this how-to, I will use :

  • the table SALES as the current table
  • and the table SALES_HIST as the history table.

And we will create the predicates as :

  • the data must be retrieve form the SALES table if the year is greater than 1999
  • the data must be retrieve form the SALES_HIST table if the year is smaller or equal than 1999

5.1 - Creation of the History table

To do it, we will connect to the SH schema and fired this statement :

C:\Documents AND Settings\Nicolas>sqlplus SH/SH
 
SQL*Plus: Release 10.2.0.4.0 - Production ON Fri Jun 19 07:36:07 2009
 
Copyright (c) 1982, 2007, Oracle.  ALL Rights Reserved.
 
 
Connected TO:
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
 
[email protected]>CREATE TABLE sales_hist AS SELECT * FROM sales WHERE to_char(time_id,'YYYY') <= 1999;
 
TABLE created.

5.2 - In the physical layer

Then as the SALES_HIST is a copy of the SALES table and that we want that the two tables have the same behaviour we must create the same model for the table SALES_HIST than for the table SALES.

Then :

  • import the table SALES_HIST (File/Import/From Database)
  • and create all the same joins (with the foreign key icon) that have the table SALES with the same condition as below :
    • for the Channel table : CHANNELS.CHANNEL_ID = SALES_HIST.CHANNEL_ID
    • for the Products table : PRODUCTS.PROD_ID = SALES_HIST.PROD_ID
    • for the Times table : TIMES.TIME_ID = SALES_HIST.TIME_ID
    • for the Promotions table : PROMOTIONS.PROMO_ID = SALES_HIST.PROMO_ID
    • for the Demographic table : DEMOGRAPHICS.CUST_ID = SALES_HIST.CUST_ID
    • for the Customer table : CUSTOMERS.CUST_ID = SALES_HIST.CUST_ID
    • for the Cost table : COSTS.PROD_ID = SALES_HIST.PROD_ID AND COSTS.TIME_ID = SALES_HIST.TIME_ID AND COSTS.PROMO_ID = SALES_HIST.PROMO_ID AND COSTS.CHANNEL_ID = SALES_HIST.CHANNEL_ID

And you will then end up with this model (select the table SALES_HIST and SALES and ask by a right click the physical object and direct join):

5.3 - In the Business Model Layer

The next step is to have for one logical column of the fact table, two physical columns. In this way, when you will ask for instance for the amount sold, it will retrieve the data in the two physical table (defined as logical table source) if the predicate is validated.

To do this, you have multiple way to do it but you have an handy way by dragging and dropping the physical column on the logical column as this picture below show it :

Remark that by doing this, a logical table source SALES_HIST is created below the logical fact.

Then if you open the logical column, you will able to see in the data type tab, two physical maps for one logical column :

Perform this step for all other physical measure columns (if needed).

The next step is to set the fragmentation predicate to give all the information that need OBIEE to choose the relevant table.

Open the logical table source and set the fragment content :

  • in the history table (SALES_HIST):
    SH.Times."Calendar Year" <= 1999
  • in the current table (SALES):
    SH.Times."Calendar Year" > 1999

And don't forget to check the box “This source should be combined with other sources at this level” as this picture below :

6 - Test in answer

6.1 - Without filter

Then if you create an answer and select this two columns :

  • Calendar.“Calendar Year”
  • and “Sales Facts”.“Amount Sold”

without filter.

The OBI Server will fired this query that you can found in the log.

SELECT D3.c2 AS c1,
     SUM(D3.c3) AS c2
FROM 
     ((SELECT T268.CALENDAR_YEAR AS c2,
               T245.AMOUNT_SOLD AS c3
          FROM 
               SH.TIMES T268,
               SH.SALES T245
          WHERE  ( T245.TIME_ID = T268.TIME_ID ) 
          UNION ALL
          SELECT T268.CALENDAR_YEAR AS c2,
               T4937.AMOUNT_SOLD AS c3
          FROM 
               SH.TIMES T268,
               SH.SALES_HIST T4937
          WHERE  ( T268.TIME_ID = T4937.TIME_ID ) )
     ) D3
GROUP BY D3.c2
ORDER BY c1

6.2 - With a filter on the column of the predicate

As :

Calendar."Calendar Year" <= 1999

You can then see in the log that OBIEE perform well the fragmentation and fired a SQL against only the History table.

SELECT T268.CALENDAR_YEAR AS c1,
     SUM(T4937.AMOUNT_SOLD) AS c2
FROM 
     SH.TIMES T268,
     SH.SALES_HIST T4937
WHERE  ( T268.TIME_ID = T4937.TIME_ID AND T268.CALENDAR_YEAR <= 1999 ) 
GROUP BY T268.CALENDAR_YEAR
ORDER BY c1

6.3 - With filter on a column that doesn't belong to the predicate

If we create a filter with the same meaning (the year must be smaller or equal than 1999) but with the date column :

YEAR(Calendar."Time Id") <= 1999

The fragmentation doesn't occur any more. You have to implement a time-based predicate (ie a parallel content predicate with the use of the OR operator).

SELECT D3.c2 AS c1,
     SUM(D3.c3) AS c2
FROM 
     ((SELECT T268.CALENDAR_YEAR AS c2,
               T245.AMOUNT_SOLD AS c3
          FROM 
               SH.TIMES T268,
               SH.SALES T245
          WHERE  ( T245.TIME_ID = T268.TIME_ID AND TO_NUMBER(TO_CHAR(T245.TIME_ID, 'yyyy'), '9999') <= 1999 
AND TO_NUMBER(TO_CHAR(T268.TIME_ID, 'yyyy'), '9999') <= 1999 ) 
          UNION ALL
          SELECT T268.CALENDAR_YEAR AS c2,
               T4937.AMOUNT_SOLD AS c3
          FROM 
               SH.TIMES T268,
               SH.SALES_HIST T4937
          WHERE  ( T268.TIME_ID = T4937.TIME_ID AND TO_NUMBER(TO_CHAR(T268.TIME_ID, 'yyyy'), '9999') <= 1999
AND TO_NUMBER(TO_CHAR(T4937.TIME_ID, 'yyyy'), '9999') <= 1999 ) )
     ) D3
GROUP BY D3.c2
ORDER BY c1

7 - How to set up the predicate to always perform an Union ?

You have to use this trick, otherwise you will receive the error that you have in the support section

The trick is to set a greater and a smaller condition based on the infinite limit negative and positive as for instance on a year predicat :

  • for the first table :
SH.Times."Calendar Year" < 9999
  • for the second table :
SH.Times."Calendar Year" > 0

8 - Support

8.1 - Unknown content predicate overlap is not resolvable

The predicate must used in one table the greater and in the other the smaller condition. If you don't, you will end up :

  • with only one table in the query (the first one in the list)
  • or with this error.
ERRORS: BUSINESS MODEL SH:
[nQSError: 15001] Could not load navigation space for subject area SH.
[nQSError: 15027] Unknown content predicate overlap is not resolvable: position NotEqual(1), predicate NotEqual(0)
ERRORS:BUSINESS MODEL SH:
[nQSError: 15001] Could not load navigation space for subject area SH.
[nQSError: 15022] Cannot extract overlapping and non-overlapping predicate descriptions from the following 
two conditions: (1) GreaterThan(0), and (2) GreaterThan(1).

8.2 - Multiple references are not supported

SH.Times."Calendar Year" <= 1999 OR  EXTRACT( YEAR  FROM SH.Times."Calendar Year Id") <= 1999

After setting up this predicate, a check of the repository will fired this error :

nQSError14009 : Multiple references are not supported with the predicates and OR condition

9 - Documentation / Reference

dat/obiee/obis/fragmentation_content.txt · Last modified: 2017/09/13 20:24 by gerardnico