Lexical Analysis - (Token|Lexical unit|Lexeme|Symbol|Word)

> Code - (Programming|Computer) Language > Code - Grammar / Syntax (Lexical)

1 - About

A token is symbols of the vocabulary of the language.

Each token is a single atomic unit of the language.

The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.

A token is:

The process of finding and categorizing tokens from an input stream is called “tokenizing” and is performed by a Lexer (Lexical analyzer).

Token represents symbols of the vocabulary of a language.

A token is the result of parsing the document down to the atomic elements generally of a language.

See also Natural Language - Token (Word|Term)


3 - Lexeme Type

A token might be:


Consider the following programming expression:

sum = 3 + 2;

Tokenized in the following table:

Lexeme Lexeme type
sum Identifier
= Assignment operator
3 Integer literal
+ Addition operator
2 Integer literal
; End of statement

4 - Properties

4.1 - Terminal

5 - Documentation / Reference