Oracle Database - The (index) Clustering Factor Statistics
Table of Contents
1 - About
The (index) Clustering Factor Statistics is a index statistics that indicates the amount of order of the rows in the table based on the values of the index :
- If the value is near the number of blocks, then the table is very well ordered. In this case, the index entries in a single leaf block tend to point to rows in the same data blocks.
- If the value is near the number of rows, then the table is very randomly ordered. In this case, it is unlikely that index entries in the same leaf block point to rows in the same data blocks.
We could also view the clustering factor as :
- a number that represents the number of logical I/Os against the table that would be performed to read the entire table via the index
- an indication of how ordered the table is with respect to the index itself.
2 - Articles Related
3 - How the clustering factor can affect cost.
Assume the following situation:
- There is a table with 9 rows.
- There is a non-unique index on col1 for table.
- The c1 column currently stores the values A, B, and C.
- The table only has three Oracle blocks.
Case 1: The index clustering factor is low for the rows as they are arranged in the following diagram.
Block 1 Block 2 Block 3 ------- ------- -------- A A A B B B C C C
This is because the rows that have the same indexed column values for c1 are located within the same physical blocks in the table. The cost of using a range scan to return all of the rows that have the value A is low, because only one block in the table needs to be read.
Case 2: If the same rows in the table are rearranged so that the index values are scattered across the table blocks (rather than collocated), then the index clustering factor is higher.
Block 1 Block 2 Block 3 ------- ------- -------- A B C A B C A B C
This is because all three blocks in the table must be read in order to retrieve all rows with the value A in col1.
4 - Example
|Index||Num Rows||Clustering Factor|
So the database is saying, :
- “If we were to read every row via the index COLOCATED_PK from start to finish, we would perform 1,190 I/Os.
- “If we were to read every row via the index DISORGANIZED_PK from start to finish, we would perform 99,932 I/Os.
4.1 - Reason
The reason for the large difference is that as Oracle range scans through the index structure, if it discovers the next row in the index is on the same database block as the prior row, it does not perform an other I/O to get the table block from the buffer cache.
However, if the next row is not on the same block, the it will release that block and perform another I/O into the buffer cache.
5 - I want a good clustering factor ?
Many factors influence the use of an index by the optimizer, including physical data layout.
You must keep in mind that :
- It will affect cases where you do index range scans of a large percentage of table.