Oracle Database - Statistics - Density (Column Selectivity)

Card Puncher Data Processing

About

Density is a column statistics that store a decimal number between 0 and 1 that measures the selectivity of a column. Values close to 1 indicate that the column is

Density provides selectivity estimates for:

  • equi-join predicates (e.g. A.COL1 = B.COL1)
  • and equality predicates (e.g. COL1 = 5).

The more selective a column, the less rows are likely to be returned by a query referencing this column in its predicate list.

The density is expressed as a decimal number between 0 and 1.

  • Values close to 1 indicate that this column is unselective
  • Values close to 0 indicate that this column is highly selective

Calculation

Without Histogram

Without histogram: <MATH> \text{Density} = \frac{1}{\text{Number of distinct NON null values}} </MATH>

With Histogram

Since Oracle 7.3, histogram are used in the calculation.

The calculation i splits by the type of popularity:

  • A non-popular value is one that does not span multiple bucket end points.
  • A popular value is one that spans multiple end points.

For non-popular values the density is calculated as the number of non-popular values divided by the total number of values. Formula: <MATH> \text{Density} = \frac{\text{Number of non-popular values}}{\text{Total number of values}} </MATH>

Popular values calculate the selectivity of a particular column values by using histograms.

The Selectivity (Density ?) for popular values is calculated as: <MATH> \text{Selectivity} = \frac{\text{Number of end points spanned by this value}}{\text{Total number of end points}} </MATH>

Documentation / Reference





Discover More
Card Puncher Data Processing
Oracle Database - Selectivity

The first measure of the plan_estimator, selectivity, represents a fraction of rows from a row set. The row set can be a base table, a view, or the result of a join or a GROUP BY operator. The selectivity...
Card Puncher Data Processing
Oracle Database - Statistics - Columns

For columns with skewed data, you should collect histograms. Statistics Col Description NUM_DISTINCT Number of distinct values (NDV) LOW_VALUE Low value HIGH_VALUE High value NUM_NULLS Number...



Share this page:
Follow us:
Task Runner