Php - Regular expressions (Perl-compatible)

Card Puncher Data Processing

About

Multilingual Regular Expression Syntax (Pattern) in Php are an implementation of Perl (PCRE) with the PCRE library (See library).

Therefore, the syntax for patterns used in these functions closely resembles to Perl (PCRE) but not totally. See Perl Differences

Syntax

The pattern must be enclosed by delimiters.

DelimiterPatternDelimiter[Modifiers]

where:

Delimiter

A delimiter:

  • is generally a forward slash /.
  • can be any non-alphanumeric, non-backslash, non-whitespace character.
  • can not be the backslash (\) and the null byte.

If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash.

Bracket (), {}, [] and <> must be escaped when they are used as literal characters.

The preg-quote function can be used to pre-escape a string pattern with a specified delimiter:

Example:

$keywords = '$40 for a g3/400';
$keywords = preg_quote($keywords, '/'); // The delimiter is a forward slash
echo $keywords; // returns \$40 for a g3\/400 // The dollar and / characters were quoted

Modifiers

The pattern modifiers are the regular expression flags and are located after the ending delimiter.

Example of case-insensitive matching:

#[a-z]#i

List:

  • g modifier: global. All matches (don't return on first match)
  • m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
  • i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
  • x modifier: extended. Spaces and text after a # in the pattern are ignored
  • X modifier: eXtra. A \ followed by a letter with no special meaning is faulted
  • s modifier: single line. Dot matches newline characters
  • u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
  • U modifier: Ungreedy. The match becomes lazy by default. Now a ? following a quantifier makes it greedy
  • A modifier: Anchored. Pattern is forced to ^
  • J modifier: Allow duplicate subpattern names

Example

File Root Detection

From the doc: dirname, detection of the file system root.

dirname('.');    // Will return '.'.
dirname('/');    // Will return `\` on Windows and '/' on *nix systems.
dirname('\\');   // Will return `\` on Windows and '.' on *nix systems.
dirname('C:\\'); // Will return 'C:\' on Windows and '.' on *nix systems.

The pattern is then

$isRoot = preg_match("/(^\.|\\\\|[a-z]:\\\\)$/i", $path)

Valid Pattern

/foo bar/
#^[^0-9]$#
+php+
%[a-zA-Z0-9_-]%
(this [is] a (pattern))
{this [is] a (pattern)}
[this [is] a (pattern)]
<this [is] a (pattern)>
/<\/\w+>/
|(\d{3})-\d+|Sm
/^(?i)php[34]/
{^\s+(\s+)?$}

Invalid patterns

/href='(.*)'      // missing ending delimiter
/\w+\s*\w+/J      // unknown modifier 'J'
1-\d3-\d3-\d4|  //  missing starting delimiter

Functions

See book.pcre

preg_match

preg_match

  • returns a boolean if there is a match
  • returns the group in the third arguments matches

Example:

$pattern = "carbon|eva";
if (preg_match("/$pattern/i",$pathString) === 1){
  echo "We have a match";
}

Management

Configuration

Library

By default, this extension is compiled using the bundled PCRE library. Alternatively, an external PCRE library can be used by passing in the –with-pcre-regex=DIR configuration option where DIR is the location of PCRE's include and library files.

Runtime

Runtime Configuration

Escape

escape character = backslash \

Example: Separate an HTML page with the p element node '</p>'

$localCount = count(preg_split("/<\/p>/",$section['content']));

Documentation / Reference





Discover More
Card Puncher Data Processing
Datacadamia - Data all the things

Computer science from a data perspective
Regexp
Multilingual Regular Expression Syntax (Pattern)

Regular expression are Expression that defines a pattern in text. This is therefore a language that permits to define structure of a text. They are a mathematically-defined concept, invented by Stephen...
Card Puncher Data Processing
PHP - String

The string in PHP is implemented as an array of bytes with the ascii character set and an integer indicating the length of the buffer. It has no information how those bytes translate to characters, leaving...
Regexp
Regexp - Perl-compatible regular expressions (PCRE)

Perl-compatible regular expressions (PCRE) is a regular expression engine that is used in most programming language. It's implemented as a nondeterministic automata (NFA) implementation. The specification...



Share this page:
Follow us:
Task Runner