Php - Regular expressions (Perl-compatible)

> Procedural Languages > Php

1 - About

Multilingual Regular Expression Syntax (Pattern) in Php are an implementation of Perl (PCRE) with the PCRE library (See library).

Therefore, the syntax for patterns used in these functions closely resembles to Perl (PCRE) but not totally. See Perl Differences

2 - Syntax

The pattern must be enclosed by delimiters.

DelimiterPatternDelimiter[Modifiers]

where:

Advertising

2.1 - Delimiter

A delimiter:

  • is generally a forward slash /.
  • can be any non-alphanumeric, non-backslash, non-whitespace character.
  • can not be the backslash (\) and the null byte.

If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash.

Bracket (), {}, [] and <> must be escaped when they are used as literal characters.

The preg-quote function can be used to pre-escape a string pattern with a specified delimiter:

Example:

$keywords = '$40 for a g3/400';
$keywords = preg_quote($keywords, '/'); // The delimiter is a forward slash
echo $keywords; // returns \$40 for a g3\/400 // The dollar and / characters were quoted

2.2 - Modifiers

The pattern modifiers are the regular expression flags and are located after the ending delimiter.

Example of case-insensitive matching:

#[a-z]#i

List:

  • g modifier: global. All matches (don't return on first match)
  • m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
  • i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
  • x modifier: extended. Spaces and text after a # in the pattern are ignored
  • X modifier: eXtra. A \ followed by a letter with no special meaning is faulted
  • s modifier: single line. Dot matches newline characters
  • u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
  • U modifier: Ungreedy. The match becomes lazy by default. Now a ? following a quantifier makes it greedy
  • A modifier: Anchored. Pattern is forced to ^
  • J modifier: Allow duplicate subpattern names
Advertising

3 - Example

3.1 - Valid Pattern

/foo bar/
#^[^0-9]$#
+php+
%[a-zA-Z0-9_-]%
(this [is] a (pattern))
{this [is] a (pattern)}
[this [is] a (pattern)]
<this [is] a (pattern)>
/<\/\w+>/
|(\d{3})-\d+|Sm
/^(?i)php[34]/
{^\s+(\s+)?$}

3.2 - Invalid patterns

/href='(.*)'      // missing ending delimiter
/\w+\s*\w+/J      // unknown modifier 'J'
1-\d3-\d3-\d4|  //  missing starting delimiter

4 - Functions

See book.pcre

5 - Management

5.1 - Configuration

5.1.1 - Library

By default, this extension is compiled using the bundled PCRE library. Alternatively, an external PCRE library can be used by passing in the –with-pcre-regex=DIR configuration option where DIR is the location of PCRE's include and library files.

5.1.2 - Runtime

5.2 - Escape

escape character = backslash \

Example: Separate an HTML page with the p element node '</p>'

$localCount = count(preg_split("/<\/p>/",$section['content']));
Advertising

6 - Documentation / Reference