Oracle Database - SQL - Regular expression

1 - About

Regular expression in the Oracle database context

The escape character for the pattern is backslash (\).

3 - Functions

3.1 - REGEXP_REPLACE (Extraction, Replace)

REGEXP_REPLACE

SELECT
        REGEXP_REPLACE( '111.222.3333', '[[:digit:]]{3}\.[[:digit:]]{3}\.([[:digit:]]{4})', '\1' )
    FROM
        dual;
3333

3.2 - REGEXP_INSTR

Search a string for a regular expression pattern and return the position of the substring.

  • Nederland Postal Code : Expression return 1 if good, 0 in other case.
SELECT REGEXP_INSTR(' 2343  LM  ', '^([[:blank:]]*[0-9]{4}[[:blank:]]*[A-Z]{2}[[:blank:]]*)$') FROM dual
  • Date verification
regexp_instr(StringDate,'^[0-9]{4}["01","02","03","04","05","06","07","08","09","10","11","12"]{2}[0-9]{2}$') <> 1

3.3 - REGEXP_SUBSTR

REGEXP_SUBSTR search a string for a regular expression pattern and return the substring that match

  • Return only the first text part of the description
SELECT REGEXP_SUBSTR(description,'^[A-Z]+') FROM mytable
  • Return the last Alphanumeric characters:
SELECT REGEXP_SUBSTR('12345^6a7b8v9','[[:alnum:]]+$') FROM dual

Return:

6a7b8v9

3.4 - REGEXP_LIKE

REGEXP_LIKE searches a character column for a pattern. Use this function in the WHERE clause of a query to return rows matching a regular expression.

SELECT col FROM test WHERE REGEXP_LIKE(col,'r[[=e=]]sum[[=e=]]');
  • Search a string that finish with a space
SELECT col FROM test WHERE REGEXP_LIKE(col,' $');

4 - Character Range '[x-y]' in Regular Expressions and Diacritic

According to the POSIX standard, a range in a regular expression includes all collation elements between the start point and the end point of the range in the linguistic definition of the current locale. Therefore, ranges in regular expressions are meant to be linguistic ranges, not byte value ranges, because byte value ranges depend on the platform, and the end user should not be expected to know the ordering of the byte values of the characters. The semantics of the range expression must be independent of the character set. This implies that a range such as [a-d] includes all the letters between a and d plus all of those letters with diacritics, plus any special case collation element such as ch in Traditional Spanish that is sorted as one character.

Oracle interprets range expressions as specified by the NLS_SORT parameter to determine the collation elements covered by a given range. For example :

 SQL> ALTER SESSION SET NLS_SORT = XWEST_EUROPEAN;
 
SESSION altered.
 
SQL> SELECT regexp_instr( 'Schröder', '^([A-Za-z]*)$') FROM dual;
REGEXP_INSTR('SCHRÖDER','^(A-ZA-Z]*)$')
-------------------------------------------------------------------
                                                                  1
 
SQL> ALTER SESSION SET NLS_SORT = BINARY;
 
SESSION altered.
 
SQL> SELECT regexp_instr( 'Schröder', '^([A-Za-z]*)$') FROM dual;
REGEXP_INSTR('SCHRÖDER','^(A-ZA-Z]*)$')
-------------------------------------------------------------------
                                                                  0
 
SQL> SELECT regexp_instr( 'Schröder', '^([[:upper:]]*[[:lower:]]*)$') FROM dual;
REGEXP_INSTR('SCHRÖDER','^([[:upper:]]*[[:lower:]]*)$')
-------------------------------------------------------------------
                                                                  1

5 - Documentation / Reference

db/oracle/regexp.txt · Last modified: 2017/10/23 16:25 by gerardnico