OBIEE - Fragmentation Content, value-based, horizontal partitioning
Table of Contents
1 - About
This capabilities of OBIEE is also known as :
- Content Fragmentation
- Value-Based fragmentation
- Fact Horizontal Partitioning
The term Fragmentation is more an OBIEE term whereas the term partitioning refer more to the database side.
Depending of a value of column, you can split the query against a table or an other. Very often, the table are fragmented according to the most important dimension in a star schema the time dimension.
For example, you will have two tables :
- a table for the current year
- a table for the history
All request for the current year will be then performed against the current year table and the database will not need to retrieve all the data (The history data are exlcuded).
The goal is simple, it's :
- to reduce the I/O (disk) effort
- to increase the selectivity
to increase at the end the performance.
The BI Server decide to use one logical table source rather than a union of relevant fragment source when you have include in the filter a predicate on the fragmentation column. The filter is evaluated against the fragmentation content definitions to decide which of the sources will be use for the query.
2 - Articles Related
3 - Prerequisites
3.1 - The same set of columns must be mapped for each fragment
However, the server returns different answers depending on how columns are mapped.
- If all the fragments of a logical table map the same set of columns, than the set of fragmented sources is considered to be the whole universe of logical table sources for the logical table. This means that measure aggregations can be calculated based on the set of fragments.
- If the set of mapped columns differ across the fragments, than the Oracle BI Server assumes that it does not have the whole universe of fragments, and therefore it would be incorrect to calculate aggregate rollups (since some fragments are missing). In this case, the server returns NULL as measure aggregates.
3.2 - The whole domain must be defined
A domain can have many sources. The sources have to all follow the rule that each level must contain sources that, when combined, comprise the whole domain of values at that level.
Setting up the entire domain for each level helps ensure that queries:
- asking for Coke, Pepsi, and 7-Up do not leave out 7-Up.
- requesting information that has been precomputed and stored in aggregate tables can retrieve that information from the aggregate tables, even if the query requests other information that is not stored in the aggregate tables.
3.3 - The fragment content expression must be consistent with the level
The fragment content expression must be consistent with the level
For instance, the below fragment is bad because it has a level on Month but fragment by day. The result is that OBIEE will not find him even if you had only a month attribute because the day attribute is not present.
4 - The predicate
General rules, Overlap is permissible when there are parallel tracks (ie the use of an OR operator). The rule is that at least one of the tracks has to be nonoverlapping. The other tracks can have overlap.
4.1 - Single column
OBI Server expects in the fragmentation content as definition only:
- values equality
- or ranges of values
It doesn't expect an <>. The predicate below will not work.
logicalColumn <> value1
4.1.1 - Value-Based
logicalColumn IN <valueList1>
- equality and or
logicalColumn = <value1> OR logicalColumn = <value2>
4.1.2 - Range-Based
For each fragment, the upper value must be expressed as <. You will get an error if you use ⇐. Likewise, you cannot use the BETWEEN predicate to describe fragment range content.
logicalColumn >= valueof(START_VALUE) AND logicalColumn < valueof(MID_VALUE1)
logicalColumn < valueof(START_VALUE) OR logicalColumn >= valueof(MID_VALUE1)
If you want to use a repository variable to define the predicate of each fragment, you have to set one repository variable by expression.
logicalColumn >= valueof(MyFirstRepositoryVariable)+1 AND logicalColumn < valueof(MySecondRepositoryVariable)
4.2 - Multicolumn
An arbitrary number of predicates on different columns can be included in each content filter. Each column predicate can be value-based or range-based.
<logicalColumn1 predicate> AND <logicalColumn2 predicate > ... AND <logicalColumnM predicate>
4.2.1 - Time-based
The time-based content fragementation is based on the parallel content OR technique. It supports the fact that a constraint on year must be sufficient to select a historical table define with a date snapshot. This example assumes that the snapshot month was April 1, 12:00 a.m. in the year 1999.
Fragment 1 (Historical):
EnterpriseModel.Period."Day" < VALUEOF("Snapshot Date") OR EnterpriseModel.Period.MonthCode < VALUEOF("Snapshot Year Month") OR EnterpriseModel.Period."Year" < VALUEOF("Snapshot Year") OR EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND EnterpriseModel.Period."Month in Year" < VALUEOF("Snapshot Month") OR EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND EnterpriseModel.Period."Monthname" IN ('Mar', 'Feb', 'Jan')
Fragment 2 (Current):
EnterpriseModel.Period."Day" >= VALUEOF("Snapshot Date") OR EnterpriseModel.Period.MonthCode >= VALUEOF("Snapshot Year Month") OR EnterpriseModel.Period."Year" > VALUEOF("Snapshot Year") OR EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND EnterpriseModel.Period."Month in Year" >= VALUEOF("Snapshot Month") OR EnterpriseModel.Period."Year" = VALUEOF("Snapshot Year") AND EnterpriseModel.Period."Monthname" IN ('Dec', 'Nov', 'Oct', 'Sep', 'Aug', 'Jul', 'Jun', '', 'Apr')
4.2.2 - Status and time-based
In an order entry application, time-based fragmentation between historical and current fragments is typically insufficient. For example, records might still be volatile, even though they are historical records entered into the database before the snapshot date. Oracle talks of an unbalanced parallel Content.
Assume, in the following example, that open orders can be directly updated by the application until the order is shipped or canceled. After the order has shipped, however, the only change that can be made to the order is to type a separate compensating return order transaction.
Fragment 1 (Historical):
Marketing."Order Status"."Order Status" IN ('Shipped', 'Canceled') AND Marketing.Calendar."Calendar Date" <= VALUEOF("Snapshot Date") OR Marketing.Calendar."Year" <= VALUEOF("Snapshot Year") OR Marketing.Calendar."Year Month" <= VALUEOF("Snapshot Year Month")
Fragment 2 (Current):
Marketing."Order Status"."Order Status" IN ('Shipped', 'Canceled') AND Marketing.Calendar."Calendar Date" > VALUEOF("Snapshot Date") OR Marketing.Calendar."Year" >= VALUEOF("Snapshot Year") OR Marketing.Calendar."Year Month" >= VALUEOF("Snapshot Year Month") OR Marketing."Order Status"."Order Status" = 'Open'
The overlapping Year and Month descriptions in the two fragments do not cause a problem, as overlap is permissible when there are parallel tracks (ie the use of an OR operator). The rule is that at least one of the tracks has to be nonoverlapping. The other tracks can have overlap.
5 - Example
To do: This example doesn't use a time-based content fragmentation
This example is made with the sh sample schema.
We will implement a simple value-based fragmentation between a history and an actual table.
In this how-to, I will use :
- the table SALES as the current table
- and the table SALES_HIST as the history table.
And we will create the predicates as :
- the data must be retrieve form the SALES table if the year is greater than 1999
- the data must be retrieve form the SALES_HIST table if the year is smaller or equal than 1999
5.1 - Creation of the History table
To do it, we will connect to the SH schema and fired this statement :
C:\Documents AND Settings\Nicolas>sqlplus SH/SH SQL*Plus: Release 10.2.0.4.0 - Production ON Fri Jun 19 07:36:07 2009 Copyright (c) 1982, 2007, Oracle. ALL Rights Reserved. Connected TO: Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - Production WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options [email protected]>CREATE TABLE sales_hist AS SELECT * FROM sales WHERE to_char(time_id,'YYYY') <= 1999; TABLE created.
5.2 - In the physical layer
Then as the SALES_HIST is a copy of the SALES table and that we want that the two tables have the same behaviour we must create the same model for the table SALES_HIST than for the table SALES.
- import the table SALES_HIST (File/Import/From Database)
- and create all the same joins (with the foreign key icon) that have the table SALES with the same condition as below :
- for the Channel table : CHANNELS.CHANNEL_ID = SALES_HIST.CHANNEL_ID
- for the Products table : PRODUCTS.PROD_ID = SALES_HIST.PROD_ID
- for the Times table : TIMES.TIME_ID = SALES_HIST.TIME_ID
- for the Promotions table : PROMOTIONS.PROMO_ID = SALES_HIST.PROMO_ID
- for the Demographic table : DEMOGRAPHICS.CUST_ID = SALES_HIST.CUST_ID
- for the Customer table : CUSTOMERS.CUST_ID = SALES_HIST.CUST_ID
- for the Cost table : COSTS.PROD_ID = SALES_HIST.PROD_ID AND COSTS.TIME_ID = SALES_HIST.TIME_ID AND COSTS.PROMO_ID = SALES_HIST.PROMO_ID AND COSTS.CHANNEL_ID = SALES_HIST.CHANNEL_ID
And you will then end up with this model (select the table SALES_HIST and SALES and ask by a right click the physical object and direct join):
5.3 - In the Business Model Layer
The next step is to have for one logical column of the fact table, two physical columns. In this way, when you will ask for instance for the amount sold, it will retrieve the data in the two physical table (defined as logical table source) if the predicate is validated.
To do this, you have multiple way to do it but you have an handy way by dragging and dropping the physical column on the logical column as this picture below show it :
Remark that by doing this, a logical table source SALES_HIST is created below the logical fact.
Then if you open the logical column, you will able to see in the data type tab, two physical maps for one logical column :
Perform this step for all other physical measure columns (if needed).
The next step is to set the fragmentation predicate to give all the information that need OBIEE to choose the relevant table.
Open the logical table source and set the fragment content :
- in the history table (SALES_HIST):
SH.Times."Calendar Year" <= 1999
- in the current table (SALES):
SH.Times."Calendar Year" > 1999
And don't forget to check the box “This source should be combined with other sources at this level” as this picture below :
6 - Test in answer
6.1 - Without filter
Then if you create an answer and select this two columns :
- Calendar.“Calendar Year”
- and “Sales Facts”.“Amount Sold”
The OBI Server will fired this query that you can found in the log.
SELECT D3.c2 AS c1, SUM(D3.c3) AS c2 FROM ((SELECT T268.CALENDAR_YEAR AS c2, T245.AMOUNT_SOLD AS c3 FROM SH.TIMES T268, SH.SALES T245 WHERE ( T245.TIME_ID = T268.TIME_ID ) UNION ALL SELECT T268.CALENDAR_YEAR AS c2, T4937.AMOUNT_SOLD AS c3 FROM SH.TIMES T268, SH.SALES_HIST T4937 WHERE ( T268.TIME_ID = T4937.TIME_ID ) ) ) D3 GROUP BY D3.c2 ORDER BY c1
6.2 - With a filter on the column of the predicate
Calendar."Calendar Year" <= 1999
You can then see in the log that OBIEE perform well the fragmentation and fired a SQL against only the History table.
SELECT T268.CALENDAR_YEAR AS c1, SUM(T4937.AMOUNT_SOLD) AS c2 FROM SH.TIMES T268, SH.SALES_HIST T4937 WHERE ( T268.TIME_ID = T4937.TIME_ID AND T268.CALENDAR_YEAR <= 1999 ) GROUP BY T268.CALENDAR_YEAR ORDER BY c1
6.3 - With filter on a column that doesn't belong to the predicate
If we create a filter with the same meaning (the year must be smaller or equal than 1999) but with the date column :
YEAR(Calendar."Time Id") <= 1999
The fragmentation doesn't occur any more. You have to implement a time-based predicate (ie a parallel content predicate with the use of the OR operator).
SELECT D3.c2 AS c1, SUM(D3.c3) AS c2 FROM ((SELECT T268.CALENDAR_YEAR AS c2, T245.AMOUNT_SOLD AS c3 FROM SH.TIMES T268, SH.SALES T245 WHERE ( T245.TIME_ID = T268.TIME_ID AND TO_NUMBER(TO_CHAR(T245.TIME_ID, 'yyyy'), '9999') <= 1999 AND TO_NUMBER(TO_CHAR(T268.TIME_ID, 'yyyy'), '9999') <= 1999 ) UNION ALL SELECT T268.CALENDAR_YEAR AS c2, T4937.AMOUNT_SOLD AS c3 FROM SH.TIMES T268, SH.SALES_HIST T4937 WHERE ( T268.TIME_ID = T4937.TIME_ID AND TO_NUMBER(TO_CHAR(T268.TIME_ID, 'yyyy'), '9999') <= 1999 AND TO_NUMBER(TO_CHAR(T4937.TIME_ID, 'yyyy'), '9999') <= 1999 ) ) ) D3 GROUP BY D3.c2 ORDER BY c1
7 - How to set up the predicate to always perform an Union ?
You have to use this trick, otherwise you will receive the error that you have in the support section
The trick is to set a greater and a smaller condition based on the infinite limit negative and positive as for instance on a year predicat :
- for the first table :
SH.Times."Calendar Year" < 9999
- for the second table :
SH.Times."Calendar Year" > 0
8 - Support
8.1 - Unknown content predicate overlap is not resolvable
The predicate must used in one table the greater and in the other the smaller condition. If you don't, you will end up :
- with only one table in the query (the first one in the list)
- or with this error.
ERRORS: BUSINESS MODEL SH: [nQSError: 15001] Could not load navigation space for subject area SH. [nQSError: 15027] Unknown content predicate overlap is not resolvable: position NotEqual(1), predicate NotEqual(0)
ERRORS:BUSINESS MODEL SH: [nQSError: 15001] Could not load navigation space for subject area SH. [nQSError: 15022] Cannot extract overlapping and non-overlapping predicate descriptions from the following two conditions: (1) GreaterThan(0), and (2) GreaterThan(1).
8.2 - Multiple references are not supported
SH.Times."Calendar Year" <= 1999 OR EXTRACT( YEAR FROM SH.Times."Calendar Year Id") <= 1999
After setting up this predicate, a check of the repository will fired this error :
nQSError14009 : Multiple references are not supported with the predicates and OR condition