About
You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them.
The densification is also :
- known as the preservation of the dimensions
- describe as show me the null values
For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales or null value next to it.
Moreover, time series calculations can be performed most easily when data is dense along the time dimension.
This is because dense data will fill a consistent number of rows for each period, which in turn makes it simple to use the analytic windowing functions with physical offsets.
Data densification is the process of converting sparse data into dense form.
Articles Related
Sparse data to densify
A typical situation in the sh schema with a sparse dimension is shown in the following example, which computes the weekly sales and year-to-date sales for the product Bounce for weeks 20-30 in 2000 and 2001:
SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year,
t.Calendar_Week_Number Week,
SUM(Amount_Sold) Sales
FROM
Sales s,
Times t,
Products p
WHERE
s.Time_id = t.Time_id AND
s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND
t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY
p.Prod_Name,
t.Calendar_Year,
t.Calendar_Week_Number
order by
product_name,
year,
week asc
PRODUCT_NAME YEAR WEEK SALES
--------------- ---------- ---------- ----------
Bounce 2000 20 801
Bounce 2000 21 4062.24
Bounce 2000 22 2043.16
Bounce 2000 23 2731.14
Bounce 2000 24 4419.36
---- Miss one row for the week 25
---- Miss one row for the week 26
Bounce 2000 27 2297.29
Bounce 2000 28 1443.13
Bounce 2000 29 1927.38
Bounce 2000 30 1927.38
Bounce 2001 20 1483.3
Bounce 2001 21 4184.49
Bounce 2001 22 2609.19
Bounce 2001 23 1416.95
Bounce 2001 24 3149.62
Bounce 2001 25 2645.98
---- Miss one row for the week 26
Bounce 2001 27 2125.12
---- Miss one row for the week 28
Bounce 2001 29 2467.92
Bounce 2001 30 2620.17
In this example, we would expect 22 rows of data (11 weeks each from 2 years) if the data were dense. However we get only 18 rows because weeks 25 and 26 are missing in 2000, and weeks 26 and 28 in 2001.
Densification : A right outer join : the first step
To overcome the problem of the sparsity, the first idea is to perform an outer join with the time table to fill the gap
SELECT Product_Name, t.Year, t.Week, NVL(Sales,0) dense_sales
FROM
(SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number) v
RIGHT OUTER JOIN
(SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year
FROM Times
WHERE Calendar_Year IN (2000, 2001)
AND Calendar_Week_Number BETWEEN 20 AND 30) t
ON (v.week = t.week AND v.Year = t.Year)
ORDER BY t.year, t.week;
Unfortunately, the product name is missing and we have to go further to find a solution.
Result :
PRODUCT_NAME YEAR WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce 2000 20 1602
Bounce 2000 21 8124.48
Bounce 2000 22 4086.32
Bounce 2000 23 5462.28
Bounce 2000 24 8838.72
2000 25 0
2000 26 0
Bounce 2000 27 4594.58
Bounce 2000 28 2886.26
Bounce 2000 29 3854.76
Bounce 2000 30 3854.76
Bounce 2001 20 2966.6
Bounce 2001 21 8368.98
Bounce 2001 22 5218.38
Bounce 2001 23 2833.9
Bounce 2001 24 6299.24
Bounce 2001 25 5291.96
2001 26 0
Bounce 2001 27 4250.24
2001 28 0
Bounce 2001 29 4935.84
Bounce 2001 30 5240.34
Densification with partitioned outer join
To overcome the problem of sparsity and to fill the product column, Oracle recommend to use a partitioned outer join.
During a partition outer join, the database :
- partitions the rows in your query based on the expression you specify in the PARTITION BY clause (in our example by product name)
- then apply the outer join on each logical partition defined in the query.
- and retrieve the result by performing an UNION of the outer joins of each of the partitions in the logically partitioned table on the other side of the join.
SELECT Product_Name, t.Year, t.Week, NVL(Sales,0) dense_sales
FROM
(SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number) v
PARTITION BY (v.Product_Name)
RIGHT OUTER JOIN
(SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year
FROM Times
WHERE Calendar_Year IN (2000, 2001)
AND Calendar_Week_Number BETWEEN 20 AND 30) t
ON (v.week = t.week AND v.Year = t.Year)
ORDER BY t.year, t.week;
The result :
PRODUCT_NAME YEAR WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce 2000 20 1602
Bounce 2000 21 8124.48
Bounce 2000 22 4086.32
Bounce 2000 23 5462.28
Bounce 2000 24 8838.72
Bounce 2000 25 0
Bounce 2000 26 0
Bounce 2000 27 4594.58
Bounce 2000 28 2886.26
Bounce 2000 29 3854.76
Bounce 2000 30 3854.76
Bounce 2001 20 2966.6
Bounce 2001 21 8368.98
Bounce 2001 22 5218.38
Bounce 2001 23 2833.9
Bounce 2001 24 6299.24
Bounce 2001 25 5291.96
Bounce 2001 26 0
Bounce 2001 27 4250.24
Bounce 2001 28 0
Bounce 2001 29 4935.84
Bounce 2001 30 5240.34
Nevertheless, this clause can not be always available and we can see a partition outer join as a outer join between the sparse data and a cross join data set of the dimension. See the following paragraph for an example.
Densification with cross join and outer join
A cross join between the dimensions is always dense. The idea is therefore to perform it first and then outer join it with the sparse data.
First, the cross join :
SELECT product_name, week, year from
(SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year
FROM Times
WHERE Calendar_Year IN (2000, 2001) AND
Calendar_Week_Number BETWEEN 20 AND 30) t,
(SELECT SUBSTR(p.Prod_Name,1,15) Product_Name
FROM Products p
WHERE p.Prod_name IN ('Bounce'))
The result :
PRODUCT_NAME WEEK YEAR
--------------- ---------- ----------
Bounce 29 2000
Bounce 21 2001
Bounce 26 2001
Bounce 29 2001
Bounce 30 2001
Bounce 20 2001
Bounce 24 2001
Bounce 23 2000
Bounce 27 2000
Bounce 30 2000
Bounce 28 2001
Bounce 25 2000
Bounce 26 2000
Bounce 22 2001
Bounce 25 2001
Bounce 27 2001
Bounce 21 2000
Bounce 22 2000
Bounce 23 2001
Bounce 20 2000
Bounce 24 2000
Bounce 28 2000
And then an outer join to retrieve the amount sold.
SELECT cross_join.Product_Name, cross_join.Year, cross_join.Week, NVL(data_to_densify.Sales,0) dense_sales
FROM
(SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales
FROM Sales s, Times t, Products p
WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND
p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND
t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number) data_to_densify
RIGHT OUTER JOIN
(
SELECT week, year, product_name from
(SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year
FROM Times
WHERE Calendar_Year IN (2000, 2001) AND
Calendar_Week_Number BETWEEN 20 AND 30) t,
(SELECT SUBSTR(p.Prod_Name,1,15) Product_Name
FROM Products p
WHERE p.Prod_name IN ('Bounce') ) p
) cross_join
ON (
data_to_densify.product_Name = cross_join.Product_Name AND
data_to_densify.week = cross_join.week AND
data_to_densify.Year = cross_join.Year
)
ORDER BY year, week
In our example above, a right outer join is used but of course all outer joins as left and full can be used instead.
The result :
PRODUCT_NAME YEAR WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce 2000 20 1602
Bounce 2000 21 8124.48
Bounce 2000 22 4086.32
Bounce 2000 23 5462.28
Bounce 2000 24 8838.72
Bounce 2000 25 0
Bounce 2000 26 0
Bounce 2000 27 4594.58
Bounce 2000 28 2886.26
Bounce 2000 29 3854.76
Bounce 2000 30 3854.76
Bounce 2001 20 2966.6
Bounce 2001 21 8368.98
Bounce 2001 22 5218.38
Bounce 2001 23 2833.9
Bounce 2001 24 6299.24
Bounce 2001 25 5291.96
Bounce 2001 26 0
Bounce 2001 27 4250.24
Bounce 2001 28 0
Bounce 2001 29 4935.84
Bounce 2001 30 5240.34