Compilers #
Course structure #
- Course has theoretical and practical aspects
- Need both in programming languages!
- Written assignments and exams cover theory
- Programming assignments cover the practical/systems part
Course goal #
- Open lid of compilers and see inside
- Understand what they do, how they work, how to build them
- Correctness over performance
- Will not cover lots of optimizations - will need CS243 for it
Course project #
- Write own compiler in four parts:
- PA1: lexer
- PA2: parser
- PA3: type checker
- PA4: code generation
How are languages implemented? #
- Two major strategies
- Interpreters run the program
- Program -> Interpreter -> Machine
- Dominate high-level languages: Python, Ruby, Lisp
- Compilers translate the program
- Program -> Compiler -> Binary Code -> Machine
- Dominate low-level languages: C, C++, Go, Rust
- Interpreters run the program
- Some language implementations provide both
- Java, Javascript, WebAssembly
- Interpreter + Just-In-Time (JIT) compiler
Compiler structure #
Lexical Analysis (lexer) #
- Divides program text into “words” or “tokens”
e.g. if x == y then z = 1; else z = 2
translates into
{
"if": keyword,
" ": whitespace,
"x": variable,
"==": operator,
"y": variable,
"then": keyword,
"z": variable,
"=": operator,
"1": constant,
";": operator,
"else": keyword,
"2": constant
}
Parsing #
- Maps out relationship between tokens
- Stores this relationship as a tree data structure in memory
Semantic Analysis #
- Compilers perform limited semantic analysis to catch inconsistencies
Optimization #
- Automatically modify programs so that they
- Run faster
- Use less memory
- Note: course project has no optimization component
- CS 243: Program Analysis and Optimization
Code Generation #
- Produces assembly code
Intermediate representations #
- Many compilers perform translations between sucessive intermediate languages
- All but first and last are intermediate representations native to compiler
- Generally ordered in descending level of abstraction
- Highest is source
- Lowest is assembly
- Useful because lower levels expose features hidden by higher levels
- Registers
- Memory layout
- Raw pointers
- etc.
- But lower levels obscure higher-level meaning
- Classes
- Higher-order functions
- Loops
Issues #
- How to handle erroneous programs?
- Language design has big impact on compiler
- Determines what is easy and hard to compile
- Course theme: many tradeoffs in language design