CSCI 435

Compiler Construction

Coordinator: Gary Zoppetti

Credits: 4.0

Description

Students implement a compiler for a simplified modern programming language. Theory of compiler construction including finite state automata, LL(l) grammars and top-down parsing. Project includes lexical and syntax analysis, name storage, scope and type analysis, error recovery and code generation. Advanced topics covered as time permits including LR(k) grammars, bottom-up parsing, compiler generators (e.g., LEX and YACC) and code optimization. Offered infrequently.

Prerequisites

C- or higher in CSCI 330, CSCI 340, and CSCI 362

Sample Textbooks

Course Outcomes

  1. Recite the main phases of compilation; trace a compilation showing the transformation of a source program into different representations as it passes through these phases
  2. Design a finite state automaton (FSA) that recognizes the vocabulary of a given programming language; implement the FSA to yield a lexical analyzer for the programming language
  3. Implement a parser using bison
  4. Enhance the parser to give adequate error messages and to adequately recover from syntax errors in a source program
  5. Enhance the parser to provide scope and type analysis and to give appropriate error messages for an invalid source program
  6. Translate fundamental language constructs (e.g., declarations, expressions, assignments, I/O statements, boolean expressions, control statements, subprogram call/return, parameter transmission) into machine code

Major Topics Covered

A. Introduction

  1. Phases of compilation
  2. Passes of compilation
  3. Single-pass compilation
  4. Multi-pass compilation

B. Compiler Administration

  1. Source file handling
  2. Listing file handling
  3. Compiler directives
  4. Errors
  5. Top-level compiler structure

C. Lexical Analysis

  1. Theory: FSA and Regular Expressions
  2. Intermediate representation = tokens
  3. Scanning
  4. Symbol management and name storage
  5. Handling white space and comments
  6. Implementation

D. Syntax Analysis

  1. Theory: LL(1) grammars, top-down parsing, first and follow sets, grammar restrictions on LL(1) parsing
  2. Intermediate representation = parse tree
  3. Parser construction rules

E. Scope Analysis

  1. Blocks and scope rules
  2. Data structures needed
  3. Algorithms

F. Type Analysis

  1. Standard types
  2. Constants
  3. Variables
  4. Arrays
  5. Records
  6. Expressions
  7. Statements
  8. Procedures
  9. Parameters

G. Error Recovery

  1. Goals of error recovery
  2. The fundamental problem of error recovery
  3. Error recovery strategy

H. A Computer and its Machine Code

  1. Machine code format
  2. The stack
  3. Address computation
  4. Expression evaluation
  5. Statement execution
  6. Procedure activation

I. Code Generation of Source Code into Machine Code

  1. Generation of machine code instructions
  2. Variable Addressing Code
  3. Expression Code
  4. Statement Code
  5. Procedure Code

J. Additional Topics

  1. Bootstrapping
  2. Regular Expressions and FSA
  3. LEX
  4. YACC
  5. LR(k) Grammars and bottom-up parsing

Sample Laboratory Projects

Open laboratories (2 weeks each)

  1. Commence implementation of a compiler, starting with its shell and adding administrative functions, including recognizing and obeying compiler directives, accessing source files, generating listing and object files.
  2. Implement lexical analysis for a given programming language and add it to the compiler.
  3. Implement syntax analysis for a given programming language and add it to the compiler in the form of a top-down parser.
  4. Enhance the parser so that it provides adequate error messages adequate error messages and so that it adequately recovers from syntax errors in a source program.
  5. Enhance the parser so that it provides adequate error messages for violations of scope rules in a source program.
  6. Enhance the parser so that it provides adequate error messages for violations of type rules in a source program.
  7. Enhance the parser so that it translates the source program to a given machine code.

Closed laboratories (completed in a two-hour supervised laboratory period)

  1. Design regular expressions and a finite state automaton (FSA) to recognize the vocabulary of a given programming language.
  2. Use a lexical analyzer generator (e.g. LEX) to construct a lexical analyzer from a regular expression.
  3. Design a bottom-up grammar for a given programming language.
  4. Use a parser generator (e.g. YACC) to construct a parser from a grammar.
  5. Alter the search strategy used in the symbol table management routines to gain an understanding as to how the table organization and search strategy can affect the efficiency of the compiler.
  6. Design a top-down grammar for a given programming language.
  7. Implement a program that computes the first and follow sets for a given grammar and determines whether or not the grammar is LL(l).