Oracle Database - The (index) Clustering Factor Statistics

Card Puncher Data Processing

About

The (index) Clustering Factor Statistics is a index statistics that indicates the amount of order of the rows in the table based on the values of the index :

  • If the value is near the number of blocks, then the table is very well ordered. In this case, the index entries in a single leaf block tend to point to rows in the same data blocks.
  • If the value is near the number of rows, then the table is very randomly ordered. In this case, it is unlikely that index entries in the same leaf block point to rows in the same data blocks.

We could also view the clustering factor as :

  • a number that represents the number of logical I/Os against the table that would be performed to read the entire table via the index
  • an indication of how ordered the table is with respect to the index itself.

How the clustering factor can affect cost.

Example Effects of Clustering Factor on cost to retrieve data from a Oracle Database - Index Scans

Assume the following situation:

  • There is a table with 9 rows.
  • There is a non-unique index on col1 for table.
  • The c1 column currently stores the values A, B, and C.
  • The table only has three Oracle blocks.

Case 1: The index clustering factor is low for the rows as they are arranged in the following diagram.

Block 1     Block 2       Block 3 
-------     -------       -------- 
A  A  A     B  B  B       C  C  C 

This is because the rows that have the same indexed column values for c1 are located within the same physical blocks in the table. The cost of using a range scan to return all of the rows that have the value A is low, because only one block in the table needs to be read.

Case 2: If the same rows in the table are rearranged so that the index values are scattered across the table blocks (rather than collocated), then the index clustering factor is higher.

Block 1       Block 2        Block 3 
-------       -------        --------
A  B  C       A  B  C        A  B  C

This is because all three blocks in the table must be read in order to retrieve all rows with the value A in col1.

Example

Index Num Rows Clustering Factor
COLOCATED_PK 100,000 1190
DISORGANIZED_PK 100,000 99932

So the database is saying, :

  • “If we were to read every row via the index COLOCATED_PK from start to finish, we would perform 1,190 I/Os.
  • “If we were to read every row via the index DISORGANIZED_PK from start to finish, we would perform 99,932 I/Os.

Reason

The reason for the large difference is that as Oracle range scans through the index structure, if it discovers the next row in the index is on the same database block as the prior row, it does not perform an other I/O to get the table block from the buffer cache.

However, if the next row is not on the same block, the it will release that block and perform another I/O into the buffer cache.

I want a good clustering factor ?

Many factors influence the use of an index by the optimizer, including physical data layout.

You might be tempted, therefore , to run out and try to rebuild all of your tables now to make all indexes have a good clustering factor but that would be a waste of time.

You must keep in mind that :

  • It will affect cases where you do index range scans of a large percentage of table.
  • Generally, the table will have only one index with a good clustering factor. The rows in a table may be sorted in only one way. If having the data physically clustered is important for you, consider to use of an Iot, a B Tree Cluster Indexes, or a hash cluster over continuous rebuilds.





Discover More
Card Puncher Data Processing
Oracle Database - Index Statistics

Statistics on Index stored in the data dictionary : DBA_INDEXES DBA_IND_STATISTICS Default Index Values When Statistics Are Missing Index Statistic Default Value Used by Optimizer Levels ...
Card Puncher Data Processing
Oracle Database - Optimizer Statistics

Optimizer statistic in Oracle You can collect exact or estimated statistics physical storage characteristics and data distribution in these schema objects by using the DBMS_STATS...



Share this page:
Follow us:
Task Runner