Programming Language Theory · Part 1Project · While Compiler

Introduction to Compilers

An overview of the compiler pipeline

Published
#Compilers#Programming Languages#LLVM#Optimization#AST

What is a compiler

By definition a compiler is a program which can read program from a source language and converts it to another language the target. The main aspect of a compiler is that it preserves the meaning of input program. In general a compiler does not directly produce machine code, instead it produces assembly code and then uses an assembler to create the executable.

Compiler pipeline

A compiler usually works in several sequential stages. Each stage transforms the program representation and prepares it for the next stage. Compiler are typically split into two phases, the frontend and the backend.

Frontend Backend

Lexer

The lexer, or scanner, converts raw source code into a sequence of tokens.

  • Tokens are atomic elements like keywords, identifiers, operators, and literals.
  • Removes whitespace and comments.

For example:

int x = 42;

would be converted into the tokens:

[int] [identifier:x] [=] [number:42] [;]

Parser

The parser takes tokens from the lexer and builds a parse tree or abstract syntax tree (AST).

  • Ensures the code follows the language grammar.
  • Detects syntax errors such as missing semicolons or unmatched brackets.

again using the example of

int x = 42;

the AST produced might be something like:

Assignment
 ├─ Variable: x
 └─ Value: 42

Semantic Analysis

Semantic analysis checks the meaning of the program.

  • Type checking (e.g., cannot assign a string to an integer).
  • Scope resolution (variables must be declared before use).
  • Ensures the program is logically correct.

Errors caught at this stage:

  • Type mismatch
  • Undeclared variables
  • Invalid function calls

Intermediate Representation (optional)

Some compilers then generate an intermediate representation (IR), which is a lower-level, platform-independent version of the program. The IR makes it easier to apply optimizations and eventually generate target-specific assembly code

LLVM

The LLVM framework, for example, uses an IR that allows multiple programming languages to be compiled efficiently for different architectures, while also enabling advanced optimizations.

Optimisation

Optimizations improve program efficiency without changing its behavior. They can make code execute faster, use less memory, or reduce other resources. Even basic optimizations can have a significant impact.

Constant Folding

Constant folding evaluates expressions at compile time, so an expression such as

int x = 2+3;

would be directly replaced at compile time with the expression

int x = 5;

Dead Code Elimination

Dead code elimination removes instructions that will never be executed, such as code after a return statement.

Assembly Code Generation

The final stage translates the intermediate representation (or AST) into assembly code specific to the target CPU architecture.

  • Assigns registers and memory locations.
  • Converts operations to CPU instructions.
  • Prepares the program for linking into an executable.
Series Collection

Programming Language Theory

Comments

Loading comments...

Enjoyed This Post?

If this article sparked your curiosity, subscribe and get the newest deep dives straight to your inbox.

Never Miss a Post

Subscribe to get the latest deep dives directly to your inbox.

Introduction to Compilers | Harry Foster