Oracle Database - Histogram Analytic

> Database > Oracle Database

1 - About

How to build an histogram in Oracle (and binning)

3 - Function

3.1 - equiwidth

You create equi-width (all bin have the same distance) histogram with the WIDTH_BUCKET function.

WIDTH_BUCKET(COLUMN, MIN, MAX, bucket NUMBER)

where:

  • column is a date or number column
  • From the min, the max and the bucket number the boundary of the buckets can be calculated.

Date Equi-width histogram Example where:

  • the 'dataset' with clause represents your sample
  • 7 is the number of day for your bucket
  • 20 is the number of bucket

The below query gives you a histogram result that you can plot in Excel for instance.

WITH
  dataset AS
  (
    SELECT
      myColumnName AS Analyzed_Column -- Date or numeric
    FROM
      myTable
    WHERE
      myDateFilter > sysdate - 7*20 -- If needed
  )
  ,
  MINMAX AS
  (
    SELECT
      MIN(Analyzed_Column) AS min_value,
      MAX(Analyzed_Column) AS Max_Value
    FROM
      dataset
  )
  ,
  buckets AS
  (
    SELECT
      Analyzed_Column,
      WIDTH_BUCKET(Analyzed_Column, MINMAX.min_value, MINMAX.max_value, 20) AS
      bucket
    FROM
      dataset,
      MINMAX
  )
  ,
  buckets_full AS
  (
    SELECT
      rownum AS bucket
    FROM
      (
        SELECT
          1 AS bucket_number
        FROM
          dual
          CONNECT BY LEVEL <= (20 + 1)
      )
  )
  ,
  buckets_analysis AS
  (
    SELECT
      bucket,
      MAX(Analyzed_Column) AS max_value,
      MIN(Analyzed_Column) AS min_value,
      COUNT(1)         AS COUNT
    FROM
      buckets
    GROUP BY
      bucket
  )
SELECT
  buckets_full.bucket,
  buckets_analysis.max_value,
  buckets_analysis.min_value,
  buckets_analysis.count
FROM
  buckets_full
LEFT OUTER JOIN buckets_analysis
ON
  buckets_full.bucket = buckets_analysis.bucket
ORDER BY buckets_full.bucket;
Advertising

3.2 - equiheight

NTILE: equiheight histograms. Each bin get the same amount of records. The bin boundary varies.

See Quantile - Percentile

Example of output for a sample of 40 rows

TILE ROW COUNT
1 10
2 10
3 10
4 10
db/oracle/histogram_analysis.txt ยท Last modified: 2017/09/14 13:24 by gerardnico