Regexp - Dot (Single Character pattern)

> Procedural Languages > Multilingual Regular Expression Syntax (Pattern)

1 - About

Dot . in a regular expression matches any character in the supported character set with this characteristic, by default:

Dot has no special meaning in a character class.

Advertising

3 - Configuration

3.1 - Match Newline (DOTALL)

Dot does not match newlines by default, a modifier must be set when running the matching function.

Java Example with the DOTALL flag:

  • A pattern that capture the content between two XML nodes even if there is new line in there.
Pattern pattern = Pattern.compile("<top>(.*?)</top>",Pattern.DOTALL);

3.2 - Stop at

3.2.1 - last occurrence (Greedy mode - default)

Dot will match all character with the default greedy matching mode.

3.2.2 - First Occurrence (Lazy)

If you want to made it lazy, you need to add a ? after the quantifier. See Regular Expression - (Lazy|Reluctant) Quantifier

4 - Example

4.1 - Basic

.at matches any three-character string ending with at, including:

  • hat,
  • cat,
  • and bat.
Advertising

4.2 - Exclude newlines from the negation

With dot all, a common mistake is to assume that a negated character set like [^#] will also not match newlines.

In order to exclude newlines, they must be added to the set.

Example: Every characters that is not ( # and Linux EOF \n) will be expressed as:

[^#\n]

5 - Documentation / Reference