Intro

Compilers #

Course structure #

  • Course has theoretical and practical aspects
    • Need both in programming languages!
  • Written assignments and exams cover theory
  • Programming assignments cover the practical/systems part

Course goal #

  • Open lid of compilers and see inside
    • Understand what they do, how they work, how to build them
  • Correctness over performance
    • Will not cover lots of optimizations - will need CS243 for it

Course project #

  • Write own compiler in four parts:
    • PA1: lexer
    • PA2: parser
    • PA3: type checker
    • PA4: code generation

How are languages implemented? #

  • Two major strategies
    • Interpreters run the program
      • Program -> Interpreter -> Machine
      • Dominate high-level languages: Python, Ruby, Lisp
    • Compilers translate the program
      • Program -> Compiler -> Binary Code -> Machine
      • Dominate low-level languages: C, C++, Go, Rust
  • Some language implementations provide both
    • Java, Javascript, WebAssembly
    • Interpreter + Just-In-Time (JIT) compiler

Compiler structure #

Lexical Analysis (lexer) #

  • Divides program text into “words” or “tokens”

e.g. if x == y then z = 1; else z = 2 translates into

{
    "if": keyword,
    " ": whitespace,
    "x": variable,
    "==": operator,
    "y": variable,
    "then": keyword,
    "z": variable,
    "=": operator,
    "1": constant,
    ";": operator,
    "else": keyword,
    "2": constant
}

Parsing #

  • Maps out relationship between tokens
  • Stores this relationship as a tree data structure in memory

Semantic Analysis #

  • Compilers perform limited semantic analysis to catch inconsistencies

Optimization #

  • Automatically modify programs so that they
    • Run faster
    • Use less memory
  • Note: course project has no optimization component
    • CS 243: Program Analysis and Optimization

Code Generation #

  • Produces assembly code

Intermediate representations #

  • Many compilers perform translations between sucessive intermediate languages
    • All but first and last are intermediate representations native to compiler
  • Generally ordered in descending level of abstraction
    • Highest is source
    • Lowest is assembly
  • Useful because lower levels expose features hidden by higher levels
    • Registers
    • Memory layout
    • Raw pointers
    • etc.
  • But lower levels obscure higher-level meaning
    • Classes
    • Higher-order functions
    • Loops

Issues #

  • How to handle erroneous programs?
  • Language design has big impact on compiler
    • Determines what is easy and hard to compile
    • Course theme: many tradeoffs in language design