It took me more than a week, but I finally finished the relevant part of Chapter 2 in Aho et al.βs Purple Dragon book.
Next steps: all of Thain Chapter 4; then back to Aho for Chapter 5 (formal treatmeent of parsing and syntax-directed translation).
I. Aho Sections 2.2 - 2.4 (13 - 24 March 2026):
- 2.2 Syntax Definition
- 2.2.1 Definition of Grammars / CFGs.
- Example of
9-5+2used throughout p. 43-49. - Derivations, Parse Trees, Ambiguity, Associativity of Operators, Precedence of Operators.
- 2.3 Syntax-Directed Translation
- 2.3.1 Postfix Notation.
- 2.3.2 - 2.3.5: Synthesized Attributes, Simple SD Definitions, Tree Traversals, Translation Schemes.
- 2.4 Parsing
- 2.4.1 Top-Down Parsing.
- 2.4.2 Predictive Parsing.
- 2.4.3 When to use π-productions.
- 2.4.5 Left recursion.
- See JH notebook #2 for more.
II. Thain Chapter 4: Parsing Basic Definitions
- LL(1) grammars are CFGs that can be evaluated using only the current rule and the next token (lookahead symbol) in the input stream.
- LR(1) are more general and more powerful but require more complex parsing algorithms.
- Typically, we use a parser generator that accepts our desired LR(1) grammar; it will automatically generate th e parsing code.
- Almost all useful programming languages are in LR(1) form.
A. Context-Free Grammar is a 4-tuple
- A CFG (context free grammar) = (T,V,P,S) where P is a list of productions or rules that formally describe the allowable sentences in a language. For T,V,S:
- Terminals β T
- Non-terminals β V
- A special non-terminal called a start symbol β S.
- A sentence is a valid sequence of terminals and non-terminals in a language.
- The sentential form is a valid sequence of terminals and non-terminals.
- We use Greek symbols like alpha, beta, and gamma to represent mixed sequence of terminals and non-terminals.
- We use a sequence like Y1Y2Y3β¦Yn to indicate individual symbols in a sentential form. Yi may be a terminal or non-terminal.
B. More on non-terminals β V
- Thain uses these abbreviations but I prefer my own for clarity:
- P = program (JH abb = Prg)
- S = statement (JH abb = Stt)
- E = expression (JH abb = Exp)
- Prg, Stt, and Exp are all non-terminals β purple V.
III. Examples using grammar G2
- See JH handwritten notes.
- Common notation to denote the finite (or infinite) language associated with a grammar G is
L(G). - Specific rules as illustrated by the example of grammar G2 in JH section 10.1.
- JH 10.2 formalizes and clarifies some things about CFG = (T,V,P,S).
- JH 10.4 illustrates via a table why Prg β int + ident per the productions from 10.3.1 for grammar G2.
- int + ident is a sentence generated by the grammar because it consists only of terminals.
- Prg and Exp are sentential forms, but they are not sentences and therefore are not elements of
L(G2). - More generally,
L(G2)contains only sentences generated by grammar G2; not arbitrary sentential forms. - Along th way, Prg, Exp, and Exp + Exp, are sentential forms in a derivation.
- The final string int + ident is a sentence and therefore is an element of
L(G2).