B*Tree indexes

About

B*Tree indexes :

  • provides fast access, by key, to an individual row or range of rows
  • normally requiring few reads to find the correct row
  • are similar in implementation to a binary search tree
  • there is a one-to-one relationship between an index entry and a row: an entry points to a row. Not the case of Oracle Database - Bitmap Indexes.

  • B in B*tree does not stand for binary but rather for “balanced”.
  • A B*tree is not a binary tree at all

Article

Structure

The lowest level blocks in the tree, called leaf nodes or leaf blocks, contain every indexed key and a rowid that points to the row it is indexing. The interior blocks, above the leaf nodes are known as branch blocks. They are used to navigate through the structure.

That makes satisfying a predicate, such as the followings, pretty simple :

  • where x between 20 and 30
  • where x = 30

All leaf blocks should be at the same level.

Height Balanced Structure

To get to the leaf block to retrieve the first row of a query of the form

 SELECT INDEXED-COL FROM T WHERE INDEX_COL = :X

will take the same number of I/Os regardless of the value :X that is used. In other words, the index is height balanced

Most of B*Tree indexes will have a height of 2 or 3 even for millions of record. This means that it will take in general, two or three I/Os to find your key in the index which is not too bad.

Example

nico@ORCL>select index_name,blevel, num_rows
  2  FROM user_indexes
  3  WHERE table_name = 'BIG_TABLE';
 
INDEX_NAME                         BLEVEL   NUM_ROWS
------------------------------ ---------- ----------
BIG_TABLE_PK                            2    9542757
 
nico@ORCL>set autotrace ON;
nico@ORCL>select id FROM big_table WHERE id = 42;
 
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     1 |     6 |     2   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| BIG_TABLE_PK |     1 |     6 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          3  consistent gets
          1  rows processed
 
nico@ORCL>select id FROM big_table WHERE id = 5000000;
 
Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     1 |     6 |     2   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| BIG_TABLE_PK |     1 |     6 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          3  consistent gets
          1  rows processed

Whenever the value selected 42 of 5000000, you will have always the same statistics :

  • 3 consistent gets (three I/Os)
  • 1 rows processed

Height and Blevel properties

The level is also known as the Height of the index or the number of blocks required to go from the root block to the leaf block. the height value can be found in the INDEX_STATS view after the index has been analyzed using the ANALYZE INDEX <name> VALIDATE STRUCTURE command.

Blevel si the number of branch levels and differs from HEIGHT by one (It does not count the leaf block).

BLEVEL = HEIGHT - 1

The value of BLEVEL is found in the normal dictionnay tables such as USER_INDEXES after statistics have been gathered.

Non Unique Index

In a nonunique index, Oracle simply stores the rowid by appending it to the key as an extra column with a length byte to make the key unique. For example, an index such as

CREATE INDEX I ON T(X,Y)

is conceptually

CREATE UNIQUE INDEX I ON T(X,Y, ROWID)

In a nonunique index, you will find that the data is sorted first by index key values (in the order of the index key) and then by rowid ascending. In a unique index, the data is sorted only by the index key values.

Index Key Compression

This is not compression in the same manner taht ZIP files are compressed; rather, this is compression taht removes redundancies from concatenated (multicolumn) indexes.

Concept

The basic concept behind a compressed key index is that every entry is broken into two pieces :

  • a prefix component . It's build on the leading columns of the concatenated index and will have many repeating values.
  • and a suffix component. It's built on the trailing columns in the index key and is the unique component of the index entry within the prefix.

Index block with owner colomn repeated

Sys.Package.Dbms_Alert
Sys.Package.Dbms_Application_info
Sys.Package.Dbms_Aq
Sys.Package.Dbms_Aqadm
Sys.Package.Dbms_Aqadm_Sys
Sys.Package.Dbms_Alert_Syscalls
...

become

Index block with owner colomn factored out

**Sys** //<- prefix//
Package.Dbms_Alert //<- suffix ...//
Package.Dbms_Application_info
Package.Dbms_Aq
Package.Dbms_Aqadm
Package.Dbms_Aqadm_Sys
Package.Dbms_Alert_Syscalls
...

Example

nico@ORCL>create TABLE t
  2  AS
  3  SELECT * FROM all_objects;
 
TABLE created.
 
nico@ORCL>create INDEX t_idx ON t(owner, object_type, object_name);
 
INDEX created.
 
nico@ORCL>analyze INDEX t_idx validate structure;
 
INDEX analyzed.
 
nico@ORCL>create TABLE idx_stats
  2  AS
  3  SELECT 'noncompressed' what, a.*
  4  FROM index_stats a;
 
TABLE created.
 
nico@ORCL>define 1=1;
nico@ORCL>drop INDEX t_idx;
 
INDEX dropped.
 
nico@ORCL>create INDEX t_idx ON t(owner,object_type,object_name) compress &1;
old   1: CREATE INDEX t_idx ON t(owner,object_type,object_name) compress &1
new   1: CREATE INDEX t_idx ON t(owner,object_type,object_name) compress 1
 
INDEX created.
 
nico@ORCL>analyze INDEX t_idx validate structure;
 
INDEX analyzed.
 
nico@ORCL>insert INTO idx_stats SELECT 'compress &1', a.* FROM index_stats a;
old   1: INSERT INTO idx_stats SELECT 'compress &1', a.* FROM index_stats a
new   1: INSERT INTO idx_stats SELECT 'compress 1', a.* FROM index_stats a
 
1 row created.
 
........................................
FOR compress 2 AND 3 same procedure ....
........................................
 
nico@ORCL>select what, height, lf_blks, br_blks, btree_space, opt_cmpr_count, opt_cmpr_pctsave
  2  FROM idx_stats;
 
WHAT              HEIGHT    LF_BLKS    BR_BLKS BTREE_SPACE OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
------------- ---------- ---------- ---------- ----------- -------------- ----------------
noncompressed          3        407          3     3280096              2               29
compress 1             3        354          3     2854680              2               19
compress 2             3        287          3     2318948              2                0
compress 3             3        455          3     3662276              2               36
  • LF_BLKS = Leaf Blocks
  • BR_BLKS = Branch Blocks
  • OPT_CMPR_PCTSAV = Optimum compression percent saved (or the expected saving from compression)
  • OPT_CMPR_COUNT = Optimum compression factor

We see for COMPRESS 1 :

  • Comparing BTREE_SPACE, that the index is about 89 percent the size of the noncompressed index.
  • the number of leaf blocks has decreased measurably

Further, we see for COMPRESS 2 :

  • the saving are even more impressive. The resulting index is about 70 percent the size of the original, and so much data is able to be placed on individual blocks

The ANALYZE command against the noncompressed index populated the OPT_CMPR_PCTSAVE/OPT_CMPR_COUNT columns and estimated a 29 percent saving with COMPRESS 2.

nico@ORCL>select  3280096*(1-0.29) FROM dual;
 
3280096*(1-0.29)
----------------
      2328868.16

Not so bad compare to the value of 2318948

What happen for COMPRESS 3 :

  • The resulting index is actually larger: 110 percent the size of the original index.

This is due to the fact that each repeated prefix we remove saves the space of N copies, but add 4 bytes of overhead on the leaf block as part of the compression scheme. By adding the OBJECT_NAME column to the compressed key, we made that key unique, meaning there were no duplicate copies to factor out. Therefore, we ended up adding 4 bytes to every single index key entry and factoring out no repeating data.

The column OPT_CMPR_COUNT is dead accurate at providing the best compression count to be used and OPT_CMPR_PCTSAVE will tell you exactly how much saving to expect.

Compression is not free

The compressed index structure is more complex than it used to be. Oracle will spend more time processing the data in this structure, both :

  • while maintaining the index during modifications
  • when you search the index during the query

What we search is the tradeoff between CPU time and I/O time.

With compression :

  • our block buffer cache will be able to hold more index entries than before,
  • our cache hit-ratio might go up
  • our physical I/Os should go down (half)

BUT

  • it will take a little more CPU

Subtypes

B*Tree index has several subtypes :

  • index_organized_tables Useful for information retrieval, spatial, and olap application
  • descending_indexes Descending indexes allow for data to be sorted from “big to small” (descending) instead of “small to big” (ascending) in the index structure. There will be no extra sort step at the end of the plan.
  • Reverse Key Indexes. B*tree index whereby the bytes in the key are “reversed” to avoid contention on sequential index value.

Reference

  • Bookmark "B*Tree indexes" at del.icio.us
  • Bookmark "B*Tree indexes" at Digg
  • Bookmark "B*Tree indexes" at Ask
  • Bookmark "B*Tree indexes" at Google
  • Bookmark "B*Tree indexes" at StumbleUpon
  • Bookmark "B*Tree indexes" at Technorati
  • Bookmark "B*Tree indexes" at Live Bookmarks
  • Bookmark "B*Tree indexes" at Yahoo! Myweb
  • Bookmark "B*Tree indexes" at Facebook
  • Bookmark "B*Tree indexes" at Yahoo! Bookmarks
  • Bookmark "B*Tree indexes" at Twitter
  • Bookmark "B*Tree indexes" at myAOL
 
database/oracle/indexes/btree.txt · Last modified: 2010/08/26 18:23 by gerardnico