Computer Language - (Compiler|Interpreter) - Language translator

About

Computer Language are written in plain text. However, computers interpret only particular sequence of instructions.

This transformation from a plain text language to instructions is called compilation and is done by a compiler. Once a program's code is compiled, the program's code has been turned into machine language.

The first compiler was written by Grace Hopper, in 1952, for the A-0 System language. The term compiler was coined by Hopper.

The translation program is called a compiler where:

  • the text to be translated is called source code.
  • the output file is called a binary

The first compiler was for the language Fortran (formula translator) around 1956. The intricacy and complexity of the translation process could be reduced only by choosing a clearly defined, well structured source language. This occurred for the first time in 1960 with the advent of the language Algol 60, which established the technical foundations of compiler design that still are valid today. For the first time, a formal notation was also used for the definition of the language's structure (Naur, 1960).

The translation process is now guided by the structure of the analysed text. The text is:

  • decomposed,
  • parsed into its components according to the given syntax.

For the most elementary components, their semantics is recognized. The meaning of the source text must be preserved by the translation.

When a code doesn't need a compiler, it runs using an interpreter and is then interpreted. The translation process is done typically from top to bottom, line by line, every time the program is run.

JIT (Just in Time Compilation)

see JIT Compiler

Translation (steps|pass)

The translation process essentially consists of the following parts:

  1. Lexical analysis (Lexer): The sequence of characters of a source text is translated into token (symbols of the vocabulary of the language)
  2. Syntax analysis (Parser)): The sequence of token is transformed into a representation that directly mirrors the syntactic structure of the source. Checking: In addition to syntactic rules, compatibility rules (types of operators and operands) that define the language are verified. This phase builds as first a concrete syntax tree (CST, parse tree), and then transform it into an abstract syntax tree (AST, syntax tree).
  3. Semantic analysis. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree.
  4. Code generation: A sequence of instructions taken is generated. In general it is the most involved part and was break in multi-part of pass.

The lexer identifies tokens that adheres to the grammar and the parser makes sense of these tokens.

Compiler

Process Input element Algorithm Syntax Syntactic analysis
Lexical analysis Character Scanner Regular Word Syntax
Syntax analysis Symbol (usually called tokens) Parser Context free Phrase Syntax

Implementation

Lexer and parser generation

In simple cases, the lexer and the parser are automatically generated from the grammar file of the language with a Compiler-Compiler. In more complex cases, manual modifications or written by hand are required.

The lexical grammar and phrase grammar are usually context-free grammars, which simplifies analysis significantly, with context-sensitivity handled at the semantic analysis phase. The semantic analysis phase is generally more complex and written by hand, but can be partially or fully automated using attribute grammars.

Lexical Analysis and Parsing in one step

Serially

Lexical Analysis can be combined with the parsing step in scannerless parsing. Parsing is done at the character level, not the token level.

Concurrently

In processing computer languages, semantic processing generally comes after syntactic processing (parser), but in some cases semantic processing is necessary for complete syntactic analysis, and these are done together or concurrently.

One pass

See wiki/One-pass_compiler

Type

Cross

A compiler which generates code for a computer different from the one executing the compiler is called a cross compiler. The generated code is then transferred to the device.

Courses

https://lagunita.stanford.edu/courses/Engineering/Compilers/Fall2014/about

Benchmark: Manual Assembly vs Compiler

https://stackoverflow.com/questions/40354978/why-is-this-c-code-faster-than-my-hand-written-assembly-for-testing-the-collat

Documentation / Reference

Task Runner