Automata - Deterministic finite-state automata (DFA)

Card Puncher Data Processing

About

A Deterministic finite-state automata (DFA) is a finite automaton that cannot be in more than one state at any one time.

The term deterministic refers to the fact that on each input there is one and only one state to which the automaton can transition from its current state.

The inverse of Deterministic finite-state automata (DFA) is Nondeterministic finite-state automata (NFA) that may be in several states at once.

They are fast, without expensive backtracking.

Parsing Example

A Finite State Machine (FSM) will:

  • start with a default state (such as parsing)
  • then will parse a string character by character:
  • until it find a open token,
  • then, change your state
  • where you track and store all the characters
  • until you reach the closing token,
  • which then switches state back to the default one “parsing”.

A FSM versus a regular expression approach can come handy if you need to deal with matched nested parentheses or with deeper parsing level (different types of parentheses, etc.)

Definition

A DFA is denoted in the Five tuple notation as being a set of the following element.

<MATH> A = (Q, \sigma, \delta, q0, F) </MATH> where:

  • A is the name of the DFA
  • Q is its set of states
  • <math>\sigma</math> is its input symbols
  • <math>\delta</math> is its transition function
  • q its start state
  • F its set of accepting states

Finite

A finite automaton has the following function composition property

For the deterministic finite automata q, if the input string <math>s = y+z</math> , then processing a string s composed of two strings y and z is the same as processing first y then processing z.

<MATH>\delta(q,s) = \delta(\delta(q,y),z)</MATH>

where <math>\delta</math> is the state of an automaton q after processing a string (here s, y or z)

Regular Expression

Most regexp library are based on nondeterministic automata (NFA)

Lexer

A lexer 1) defines all token via several regular expression. To efficiently match the input against the possible tokens, all regexes are compiled together into a Deterministic Finite State Machine. The DFA is then walked character by character over the input string and fires a “match” event whenever one of the expressions is matched.

  • Advantages: it runs fast (each character is compared only once) and does not get any slower if you add more expressions.
  • Disadvantages: it requires a huge data table for the automaton, and there are many types of regular expressions that are not supported (for instance, back-references).

Library

Documentation / Reference

  • Chapter 2.2.1 - page 61 - Book - Hopcroft, Motwani, Ullman, Automata Theory, Languages, and Computation 3rd Edition. pdf





Discover More
Card Puncher Data Processing
Automata - Automaton Library

This page lists state machine libraries. All regular expression library implements an automaton. We list a couple of example. : Finite-state automaton for regular...
Word Recognition Automaton
Automata - Finite Automata

A finite automaton is an automaton that has a set of states and its control moves from state to state in response to external inputs. It has a start and an end state and there are only a finite number...
Card Puncher Data Processing
Automata - Nondeterministic automata (NFA)

Nondeterministic finite automata (NFA) is a finite automata that can be in several states at once (several variable) The inverse of Nondeterministic automata (NFA) is deterministic finite-state automata...
Compiler
Language - Compiler compilers or (lexer|parser) generators

Compiler-compilers splits the work into a lexer and a parser: The Lexer reads text data (file, string,...) and divides it into tokens using lexer rule (patterns). It generates as output a list of tokens...
Regexp
Multilingual Regular Expression Syntax (Pattern)

Regular expression are Expression that defines a pattern in text. This is therefore a language that permits to define structure of a text. They are a mathematically-defined concept, invented by Stephen...



Share this page:
Follow us:
Task Runner