Regular Expression - Group (Capture|Substitution)

> Procedural Languages > Multilingual Regular Expression Syntax (Pattern)

1 - About

group are regexp expression that normally capture the match during the parsing. You can then (extract|reference) the match content.

Groups are inside parentheses.

Look-around are also groups but implements an assertion and are not capturing the content

Advertising

3 - Syntax

Every group must begin with an open bracket and end with a close bracket.

(myRegexp0 ( myRegexp1) ( myRegexp2) )
Construct Definition
(?<name>X) X, as a named-capturing group
Non-Capturing
(?:X) X, as a non-capturing group
(?>X) X, as an independent, non-capturing group
Assertion (See Regexp - Look- around group (Assertion))
(?=X) X, positive lookahead (via zero-width)
(?!X) X, negative lookahead (via zero-width)
(?<=X) X, positive lookbehind (via zero-width)
(?<!X) X, negative lookbehind (via zero-width)
Flag
(?idmsuxU-idmsuxU) Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off
Advertising

4 - Index

Capturing groups are numbered by counting their opening parentheses from left to right.

In the expression ((A)(B(C))), for example, there are the following groups:

  • 0 - Group zero always stands for the entire expression - ((A)(B(C)))
  • 1 - ((A)(B(C)))
  • 2 - (A)
  • 3 - (B(C))
  • 4 - (C)

5 - Non-Capturing

5.1 - Basic

A non capturing group will not be indexed.

In the expression (?:A)(B)(C), for example, there are the following groups:

  • 0 - Group zero always stands for the entire expression - (?:A)(B)(C)
  • 1 - (B)
  • 2 - (C)

The group (?:A) was not captured.

Advertising

5.2 - Look-around

6 - Substitution

When you want to use the content of each captured group, you will generally use the following substitution construct:

  • ${n} for the group index
  • ${groupName} for the group name

When using group index, this construct must be used when:

  • the number of group is greater than 9
  • you want a number that follow the substitution

The dollar is also not always mandatory:

  • $n for the group index
  • $groupName for the group name

Their is also a shorthand notation for groups up to 9.

Symbol Definition
\0 backreference to the entire expression
\1 backreference to group 1
\2 backreference to group 2
\n backreference to group n

7 - Example

The below regular expression has two groups

([^ ]) (.*)

where:

  • the first group is [^ ] which will parse all non space characters.
  • the second group is .* which will take all characters.

if you parse the following text:

Hello World

You will get:

  • in the first group, \1, the text Hello
  • and in the second group, \2, the text World

See more example here: Notepad++ - Replace with Regular Expression

lang/regexp/group.txt · Last modified: 2019/06/13 14:38 by gerardnico